The Upworthy Research Archive is a dataset of headline A/B tests conducted by Upworthy from early 2013 into April 2015. This page documents the archive and answers common questions. We have also published an academic paper that reports the details of the archive and our work to validate the data. Please cite this paper when using the archive.
You can download the archive on the Open Science Framework at osf.io/jd64p/.
For background on the Upworthy Archive, please consult and cite the following sources:
In June 2024, the team published an update with evidence of systematic randomization problems in 22% of A/B tests. Working with the authors of six peer reviewed studies based on our work, we found that no prior findings were materially affected. We discourage confirmatory researchers from using experiments conducted between June 25, 2013 and January 10 2014.
The following post includes more information and guidance for authors going forward. We have also updated the Data Descriptor at Scientific Data to reflect this update, and have added the randomization_imbalace_risk
column (documented below) to the data to indicate A/B tests that we believe are likely affected.
The Upworthy Research Archive contains packages
within tests
. On Upworthy, packages are bundles of headlines and images that were randomly assigned to people on the website as part of a test. Tests can include many packages.
The archive only includes aggregate results on the number of viewers a package received and how many of those viewers clicked on that package. It does not include any individual-level information to differentiate between viewers.
This research archive includes valid tests conducted by Upworthy in the study period. We have omitted tests that were never shown to viewers (zero impressions) and packages that had missing test IDs.
To support reliable scholarly research and education, we are releasing the Upworthy Research Archive as a partial exploratory dataset. We will share a confirmatory dataset with researchers whose analysis plans have been peer reviewed (read more about the process).
The exploratory dataset includes 22,666 packages from 4,873 tests. The confirmatory dataset includes 105,551 additional packages from 22,743 tests.
To support time-series research, both datasets are a random sample stratified by week number.
We expect that many researchers will want to data-mine the archive for specific headline types and compare them to other headlines within the same tests. We created this task as a workshop and homework assignment for students in a Cornell class on the design and governance of experiments. Students were asked to meta-analyze the effect of including a notable person’s name in a headline, and the effect of including a number in a headline. We offer the materials below as food for thought when developing your own data-mining approach:
The dataset of packages
contains the following columns:
Time-related columns:
Experiment-related columns:
impressions
for all packages that share the same clickability_test_id
.Stimuli shown to viewers:
Outcomes:
clicks
divided by the number of impressions
.Miscellaneous columns that may be of interest. To our knowledge, none of these columns represent information shown to viewers as part of A/B tests:
Columns we learned about through conversations with former staff:
We have also been scraping Upworthy and the Internet Archive in search of supplementary information, including images. Since only some tests and packages can be supplemented in this way, we are doubtful that this data will be useful for confirmatory research. Please contact us if you think that these columns might be important to your research.