The Upworthy Research Archive

J. Nathan Matias, Assistant Professor, Cornell University: co-lead (ongoing contact)
Kevin Munger, Assistant Professor, Penn State University: co-lead (ongoing contact)
Marianne Aubin Le Quere, PhD student, Cornell University: data validation and documentation
Charles Ebersole, Postdoc, University of Virginia: data controller

The Upworthy Research Archive is an open dataset of thousands of A/B tests of headlines conducted by Upworthy from January 2013 to April 2015. At the time of release, it is the largest open-access collection of randomized behavioral studies openly available for research and education. We hope it doesn’t stay that way for long (see below if you wish to contribute data).

We have published a data descriptor with Nature Scientific Data that fully describes the dataset and includes validation information:

Matias, J., Munger, K., Aubin Le Quere, M., Ebersole, C. (2021) The Upworthy Research Archive, a time series of 32,487 experiments in U.S. media. Nature Scientific Data.

News and Updates

Critical update June 2024: Ensuring Reliable Science from Platform A/B Test Archives - an Update to the Upworthy Archive
Original project announcement: Announcing the Upworthy Research Archive: Help us advance human understanding by studying this massive dataset of headline A/B tests

What can I do with the Upworthy Research Archive?

We hope this dataset will be used in three ways: to conduct academic research, to serve as an resource for educators, and to inform the implementation of A/B tests by organizations.

We expect that this dataset will help advance knowledge in many fields, including:

Political Science, Communication, Psychology, and Marketing theories on the language that influences people to click on articles
Organizational Behavior research on how firms learn over time (or not) through experimentation
Statistical advances on the analysis of experiments
Computer Science research in machine learning and cybersecurity
Meta-scientific questions about the knowledge from behavioral experiments and how useful they are at predicting future outcomes

How can I learn more about Upworthy?

We suggest the following references:

Karpf, D. (2016). Analytic activism: Digital listening and the new political strategy. Oxford University Press.
- David Karpf’s book puts Upworthy in its context in U.S. politics and contemporary context. It includes a chapter based on interviews with staff at Upworthy.
Fitts, Alexis Sobel. (2014) The king of content: How Upworthy aims to alter the Web, and could end up altering the world. Columbia Journalism Review.
- This article, published partway through the dataset, includes details about how Upworthy founders and staff thought about and talked about their work in public.
Matias, J.N., Munger, K. (2019) The Upworthy Research Archive: A Time Series of 32,488 Experiments in U.S. Advocacy. CODE@ MIT Conference

venn diagram of Upworthy's focus on awesome, meaningful, visual content

What is the structure of the data?

About the Archive contains a full description of the data, references and slides.

Confirmatory Research with the Upworthy Research Archive

Multiple comparisons and overfitting represent serious risks to scientific understanding with a dataset of this size. By doing the extra work of supporting cross-validation, we hope to maximize the amount of highly-credible science that results from this this dataset. For that reason, we have structured the data to support researchers to develop registered reports for research projects.

We are providing an Exploratory Dataset of 4,873 experiments to support academic research and teaching. For researchers who plan to conduct confirmatory research that tests hypotheses, we are keeping a larger Confirmatory Dataset of 22,743 experiments in reserve. During the period until we released the dataset publicly in August 2021, we also retained a holdout dataset for a meta-scientific study with the Center for Open Science.

The full dataset, with all parts, may now be accessed on the Open Science Framework at https://osf.io/jd64p/

Impact

The following projects are just some of the scholarly projects that cite or draw on the Upworthy Archive in some way:

Scientific Studies Using the Archive

Robertson, C. E., Pröllochs, N., Schwarzenegger, K., Pärnamets, P., Van Bavel, J. J., & Feuerriegel, S. (2023). Negativity drives online news consumption. Nature Human Behaviour, 7(5), 812-822.
- Robertson, Claire (2023) Two research teams submitted the same paper to Nature - You won’t BELIEVE what happens next!!. Springer Nature Research Communities.
- Benton, J. (2023) Negative words in news headlines generate more clicks — but sad words are more effective than angry or scary ones. Nieman Lab.
Larsen, N., Stallrich, J., Sengupta, S., Deng, A., Kohavi, R., & Stevens, N. T. (2024). Statistical challenges in online controlled experiments: A review of a/b testing methodology. The American Statistician, 78(2), 135-149.
Shulman, H. C., Markowitz, D. M., & Rogers, T. (2024). Reading dies in complexity: Online news consumers prefer simple writing. Science Advances, 10(23), eadn2555.
Gligorić, K., Lifchits, G., West, R., & Anderson, A. (2023). Linguistic effects on news headline success: Evidence from thousands of online field experiments (Registered Report). Plos one, 18(3), e0281682.
Banerjee, A., & Urminsky, O. (2024). The language that drives engagement: a systematic large-scale analysis of headline experiments. Marketing Science.
Textbooks
Alexander, R. (2023). Telling Stories with Data: With Applications in R. Chapman and Hall/CRC.

Guidance on Science Overall

Robertson, C. E., Pröllochs, N., Schwarzenegger, K., Pärnamets, P., Van Bavel, J. J., & Feuerriegel, S. (2023). Negativity drives online news consumption. Nature Human Behaviour, 7(5), 812-822.
Polonioli, A., Ghioni, R., Greco, C., Juneja, P., Tagliabue, J., Watson, D., & Floridi, L. (2023). The Ethics of Online Controlled Experiments (A/B Testing). Minds and Machines, 1-27.

Statistical Advances

Wu, J. J., Mazzuchi, T. A., & Sarkani, S. (2023). Comparison of multi-criteria decision-making methods for online controlled experiments in a launch decision-making framework. Information and Software Technology, 155, 107115.

Other Data Releases

Crabtree, C., Kim, J. Y., Gaddis, S. M., Holbein, J. B., Guage, C., & Marx, W. W. (2023). Validated names for experimental studies on race and ethnicity. Scientific data, 10(1), 130.

What is a Registered Report?

To learn more, see the Center for Open Science introduction to registered reports.

Generally, Registered Reports are form of “results-blind” peer review. A journal will evaluate the submission in terms of the appropriateness of the analysis strategy for addressing the theoretical question.

With the Upworthy Research Archive, researchers can use the Exploratory Dataset to understand the structure of the data and write code to analyze it. Journals will then review the scientific merit of the Registered Report, and if they agree to publish it, the code can be run on the Confirmatory dataset to produce the final results.

To date, 242 academic journals have published Registered Reports. The full list can be found here, under the “Participating Journals” tab.

I operate a publisher and want to add to the archive by donating our historical A/B tests

We live in a time of unprecedented behavioral research by news publishers, advertisers, and tech companies. By donating your historical A/B tests, you can contribute to education and to breakthroughs across multiple scientific fields.

Our team can help you assess the potential scientific value of your archives and chart a privacy-preserving way to contribute to open knowledge. Please contact J. Nathan Matias <nathan.matias@cornell.edu> if this interests you.

How can I use the archive with my students?

J. Nathan Matias has developed a first draft of open educational resources for undergraduate classes. Please contact Nathan if you are interested to use the archive for your class.

Advisory Board

Dr. Helen Margetts is is Professor of Society and the Internet and Professorial Fellow at Mansfield College, Oxford.
Dr. David Karpf is an associate professor in the School of Media and Public Affairs at George Washington University
Dr. Brian Nosek is a professor in the Department of Psychology at the University of Virginia.

Acknowledgments

Kevin, Nathan, and Marianne are grateful to the following people and organizations for providing key support and input in the creation of the Upworthy Research Archive.

Good/Upworthy
Andy Morris
Andrew Singh
Rajni Aneja
Matt Salganik
Jason Rhody

Disclaimer

While we are very grateful to Good/Upworthy for donating this data for scientific and educational purposes, this project is independent from the company. This website does not speak for Good/Upworthy in any way.