The Upworthy Research Archive is an open dataset of thousands of A/B tests of headlines conducted by Upworthy from January 2013 to April 2015. At the time of release, it is the largest open-access collection of randomized behavioral studies openly available for research and education. We hope it doesn’t stay that way for long (see below if you wish to contribute data).
We have published a data descriptor with Nature Scientific Data that fully describes the dataset and includes validation information:
We hope this dataset will be used in three ways: to conduct academic research, to serve as an resource for educators, and to inform the implementation of A/B tests by organizations.
We expect that this dataset will help advance knowledge in many fields, including:
We suggest the following references:
About the Archive contains a full description of the data, references and slides.
Multiple comparisons and overfitting represent serious risks to scientific understanding with a dataset of this size. By doing the extra work of supporting cross-validation, we hope to maximize the amount of highly-credible science that results from this this dataset. For that reason, we have structured the data to support researchers to develop registered reports for research projects.
We are providing an Exploratory Dataset of 4,873 experiments to support academic research and teaching. For researchers who plan to conduct confirmatory research that tests hypotheses, we are keeping a larger Confirmatory Dataset of 22,743 experiments in reserve. During the period until we released the dataset publicly in August 2021, we also retained a holdout dataset for a meta-scientific study with the Center for Open Science.
The full dataset, with all parts, may now be accessed on the Open Science Framework at https://osf.io/jd64p/
To learn more, see the Center for Open Science introduction to registered reports.
Generally, Registered Reports are form of “results-blind” peer review. A journal will evaluate the submission in terms of the appropriateness of the analysis strategy for addressing the theoretical question.
With the Upworthy Research Archive, researchers can use the Exploratory Dataset to understand the structure of the data and write code to analyze it. Journals will then review the scientific merit of the Registered Report, and if they agree to publish it, the code can be run on the Confirmatory dataset to produce the final results.
To date, 242 academic journals have published Registered Reports. The full list can be found here, under the “Participating Journals” tab.
We live in a time of unprecedented behavioral research by news publishers, advertisers, and tech companies. By donating your historical A/B tests, you can contribute to education and to breakthroughs across multiple scientific fields.
Our team can help you assess the potential scientific value of your archives and chart a privacy-preserving way to contribute to open knowledge. Please contact J. Nathan Matias <firstname.lastname@example.org> if this interests you.
J. Nathan Matias has developed a first draft of open educational resources for undergraduate classes. Please contact Nathan if you are interested to use the archive for your class.
Kevin, Nathan, and Marianne are grateful to the following people and organizations for providing key support and input in the creation of the Upworthy Research Archive.
While we are very grateful to Good/Upworthy for donating this data for scientific and educational purposes, this project is independent from the company. This website does not speak for Good/Upworthy in any way.