- Perspective
- Published:
Integrating explanation and prediction in computational social science
- Jake M. Hofman ORCID:orcid.org/0000-0002-9364-96041 na1,
- Duncan J. Watts2,3,4 na1,
- Susan Athey ORCID:orcid.org/0000-0001-6934-562X5,
- Filiz Garip6,
- Thomas L. Griffiths7,8,
- Jon Kleinberg9,10,
- Helen Margetts11,12,
- Sendhil Mullainathan13,
- Matthew J. Salganik6,
- Simine Vazire14,
- Alessandro Vespignani ORCID:orcid.org/0000-0003-3419-420515 &
- …
- Tal Yarkoni ORCID:orcid.org/0000-0002-6558-511316
Naturevolume 595, pages181–188 (2021)Cite this article
36kAccesses
281Citations
151Altmetric
Abstract
Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions—the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes—and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.
This is a preview of subscription content,access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
9,800 Yen / 30 days
cancel any time
Subscription info for Japanese customers
We have a dedicated website for our Japanese customers. Please go tonatureasia.com to subscribe to this journal.
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
¥ 4,980
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Watts, D. J. A twenty-first century science.Nature445, 489 (2007).
Lazer, D. et al. Computational social science.Science323, 721–723 (2009).
Salganik, M. J.Bit by Bit: Social Research in the Digital Age (Princeton Univ. Press, 2018).
Lazer, D. M. J. et al. Computational social science: obstacles and opportunities.Science369, 1060–1062 (2020).
Lazer, D. et al. Meaningful measures of human society in the twenty-first century.Naturehttps://doi.org/10.1038/s41586-021-03660-7 (2021).
Wing, J. M. Computational thinking.Commun. ACM49, 33–35 (2006).
Hedström, P. & Ylikoski, P. Causal mechanisms in the social sciences.Annu. Rev. Sociol.36, 49–67 (2010).
Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the author).Stat. Sci.16, 199–231 (2001).We view our paper as an extension of Brieman’s dichotomy (the ‘algorithmic’ and ‘data modelling’ cultures), arguing that these approaches should be integrated.
Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach.J. Econ. Perspect.31, 87–106 (2017).This paper explores the relationships between predictive models and causal inference.
Molina, M. & Garip, F. Machine learning for sociology.Annu. Rev. Sociol.45, 27–45 (2019).
Shmueli, G. To explain or to predict?Stat. Sci.25, 289–310 (2010).We build on Schmueli’s distinction between prediction and explanation and propose a framework for integrating the two approaches.
Agrawal, M., Peterson, J. C. & Griffiths, T. L. Scaling up psychology via Scientific Regret Minimization.Proc. Natl Acad. Sci. USA117, 8825–8835 (2020). This paper exemplifies what we call integrative modelling.
Munafò, M. R. et al. A manifesto for reproducible science.Nat. Hum. Behav.1, 0021 (2017).
Yarkoni, T. The generalizability crisis.Behav. Brain Sci.https://doi.org/10.1017/S0140525X20001685 (2020).
Ward, M. D., Greenhill, B. D. & Bakke, K. M. The perils of policy by p-value: predicting civil conflicts.J. Peace Res.47, 363–375 (2010).
Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning.Perspect. Psychol. Sci.12, 1100–1122 (2017).
Watts, D. J. Should social science be more solution-oriented?Nat. Hum. Behav.1, 0015 (2017).
Berkman, E. T. & Wilson, S. M. So useful as a good theory? The practicality crisis in (social) psychological theory.Perspect. Psychol. Sci.https://doi.org/10.1177/1745691620969650 (2021).
Athey, S. Beyond prediction: Using big data for policy problems.Science355, 483–485 (2017).
Lipton, Z. C. The mythos of model interpretability.Queue16, 31–57 (2018).
Kleinberg, J., Ludwig, J., Mullainathan, S. & Sunstein, C. R. Discrimination in the age of algorithms.J. Legal Anal.10, 113–174 (2018).
Coveney, P. V., Dougherty, E. R. & Highfield, R. R. Big data need big theory too.Philos. Trans. R. Soc. A374, 20160153 (2016).
Gigerenzer, G. Mindless statistics.J. Socio-Econ.33, 587–606 (2004).
Cohen, J. The earth is round (p < .05).Am. Psychol.49, 997–1003 (1994).
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination.Am. Econ. Rev.94, 991–1013 (2004).
Ioannidis, J. P. A. Why most published research findings are false.PLoS Med.2, e124 (2005).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.Psychol. Sci.22, 1359–1366 (2011).
Open Science Collaboration. Estimating the reproducibility of psychological science.Science349, aac4716 (2015).
Meehl, P. E. Why summaries of research on psychological theories are often uninterpretable.Psychol. Rep.66, 195–244 (1990).
Gelman, A. Causality and statistical learning.Am. J. Sociol.117, 955–966 (2011).
Dienes, Z.Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference (Macmillan, 2008).
Schrodt, P. A. Seven deadly sins of contemporary quantitative political analysis.J. Peace Res.51, 287–300 (2014).
Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google flu: traps in big data analysis.Science343, 1203–1205 (2014).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations.Science366, 447–453 (2019).
Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M. & Watts, D. J. Predicting consumer behavior with web search.Proc. Natl Acad. Sci. USA107, 17486–17490 (2010).
Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems.Science355, 486–488 (2017).
Case, A. & Deaton, A. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century.Proc. Natl Acad. Sci. USA112, 15078–15083 (2015).
Oliver, M. L., Shapiro, T. M. & Shapiro, T.Black Wealth, White Wealth: A New Perspective on Racial Inequality (Taylor & Francis, 2006).
Chetty, R., Hendren, N., Kline, P. & Saez, E. Where is the land of opportunity? The geography of intergenerational mobility in the United States.Q. J. Econ.129, 1553–1623 (2014).
Wagner, C. et al. Measuring algorithmically infused societies.Naturehttps://doi.org/10.1038/s41586-021-03666-1 (2021).
Ba, B. A., Knox, D., Mummolo, J. & Rivera, R. The role of officer race and gender in police–civilian interactions in Chicago.Science371, 696–702 (2021).
Provost, F. & Fawcett, T.Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).
Makridakis, S., Wheelwright, S. C. & Hyndman, R. J.Forecasting Methods and Applications (Wiley, 1998).
Tetlock, P. E.Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).
Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems.Am. Econ. Rev.105, 491–495 (2015).
Dowding, K. & Miller, C. On prediction in political science.Eur. J. Polit. Res.58, 1001–1018 (2019).
Galesic, M. et al. Human social sensing is an untapped resource for computational social science.Naturehttps://doi.org/10.1038/s41586-021-03649-2 (2021).
Cheng, J., Adamic, L., Dow, P. A., Kleinberg, J. M. & Leskovec, J. Can cascades be predicted? InWWW '14: Proc. 23rd International Conference on World Wide Web 925–936 (2014).
Pearl, J. The seven tools of causal inference, with reflections on machine learning.Commun. ACM62, 54–60 (2019).This paper outlines the need for causal thinking in building predictive models.
Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration.Proc. Natl Acad. Sci. USA117, 8398–8403 (2020).
Fudenberg, D., Kleinberg, J., Liang, A. & Mullainathan, S. Measuring the completeness of theories.SSRNhttps://doi.org/10.2139/ssrn.3018785 (2019).
Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. InWWW '16: Proc 25th International Conference on World Wide Web 683–694 (2016).
Watts, D. J. Common sense and sociological explanations.Am. J. Sociol.120, 313–351 (2014).This paper argues that sociologists should pay more attention to prediction versus interpretability when evaluating their explanations.
Zhou, F., Xu, X., Trajcevski, G. & Zhang, K. A survey of information cascade analysis: models, predictions, and recent advances.ACM Comput. Surv.54, 1–36 (2021).
Goel, S., Watts, D. J. & Goldstein, D. G. The structure of online diffusion networks. InEC '12: Proc. 13th ACM Conference on Electronic Commerce (2012).
Wu, S., Hofman, J. M., Mason, W. A. & Watts, D. J. Who says what to whom on Twitter. InWWW’11: Proc 20th International Conference on World Wide Web 705–714 (2011).
Goel, S., Anderson, A., Hofman, J. & Watts, D. J. The structural virality of online diffusion.Manage. Sci.62, 180–196 (2015).
Berger, J. & Milkman, K. L. What makes online content viral?J. Mark. Res.49, 192–205 (2012).
Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. InWSDM '11: Proc. Fourth ACM International Conference on Web Search and Data Mining 65–74 (2011).
Tan, C., Lee, L. & Pang, B. The effect of wording on message propagation: topic- and author-controlled natural experiments on Twitter. InProc. 52nd Annual Meeting of the Association for Computational Linguistics 175–185 (2014).
Liu, T., Ungar, L. & Kording, K. Quantifying causality in data science with quasi-experiments.Nat. Comput. Sci.1, 24–32 (2021).
Hochberg, I. et al. Encouraging physical activity in patients with diabetes through automatic personalized feedback via reinforcement learning improves glycemic control.Diabetes Care39, e59–e60 (2016).
Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects.Proc. Natl Acad. Sci. USA113, 7353–7360 (2016).
Charles, D., Chickering, M. & Simard, P. Counterfactual reasoning and learning systems: the example of computational advertising.J. Mach. Learn. Res.14, 3207–3260 (2013).
Low, H. & Meghir, C. The use of structural models in econometrics.J. Econ. Perspect.31, 33–58 (2017).
Athey, S., Levin, J. & Seira, E. Comparing open and sealed bid auctions: evidence from timber auctions*.Q. J. Econ.126, 207–257 (2011).
Awad, E. et al. The Moral Machine experiment.Nature563, 59–64 (2018).
Aczel, B. et al. A consensus-based transparency checklist.Nat. Hum. Behav.4, 4–6 (2020).
Kidwell, M. C. et al. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency.PLoS Biol.14, e1002456 (2016).
Nosek, B. A. et al. Promoting an open research culture.Science348, 1422–1425 (2015).
Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution.Proc. Natl Acad. Sci. USA115, 2600–2606 (2018).
Donoho, D. 50 years of data science.J. Comput. Graph. Stat.26, 745–766 (2017).
Gelman, A. & Loken, E. The statistical crisis in science.Am. Sci.102, 460 (2014).
Rao, R. B., Fung, G. & Rosales, R. On the dangers of cross-validation. An experimental evaluation. InProc. 2008 SIAM International Conference on Data Mining 588–596 (Society for Industrial and Applied Mathematics, 2008).
Dwork, C. et al. The reusable holdout: preserving validity in adaptive data analysis.Science349, 636–638 (2015).
Chambers, C. D. Registered reports: a new publishing initiative atCortex. Cortex49, 609–610 (2013).
Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of published reports.Soc. Psychol.45, 137–141 (2014).
Bennett, J. & Lanning, S. The Netflix Prize. InProc. KDD Cup and Workshop 2007 (2007).
Dorie, V., Hill, J., Shalit, U., Scott, M. & Cervone, D. Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition.SSO Schweiz. Monatsschr. Zahnheilkd.34, 43–68 (2019).
Lin, A., Merchant, A., Sarkar, S. K. & D’Amour, A. Universal causal evaluation engine: an API for empirically evaluating causal inference models. inProc. Machine Learning Research (eds Le, T. D. et al.) Vol. 104, 50–58 (PMLR, 2019).
Craver, C. F.Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience (Clarendon, 2007).
Salganik, M. J., Lundberg, I., Kindel, A. T. & McLanahan, S. Introduction to the special collection on the Fragile Families Challenge.Sociushttps://doi.org/10.1177/2378023119871580 (2019).
Strathern, M. ‘Improving ratings’: audit in the British university system.Eur. Rev.5, 305–321 (1997).
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover new theories of human decision-making.Science 372, 1209–1214 (2021).
Author information
These authors contributed equally: Jake M. Hofman, Duncan J. Watts
Authors and Affiliations
Microsoft Research, New York, NY, USA
Jake M. Hofman
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
Duncan J. Watts
The Annenberg School of Communication, University of Pennsylvania, Philadelphia, PA, USA
Duncan J. Watts
Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA
Duncan J. Watts
Graduate School of Business, Stanford University, Stanford, CA, USA
Susan Athey
Department of Sociology, Princeton University, Princeton, NJ, USA
Filiz Garip & Matthew J. Salganik
Department of Psychology, Princeton University, Princeton, NJ, USA
Thomas L. Griffiths
Department of Computer Science, Princeton University, Princeton, NJ, USA
Thomas L. Griffiths
Department of Computer Science, Cornell University, Ithaca, NY, USA
Jon Kleinberg
Department of Information Science, Cornell University, Ithaca, NY, USA
Jon Kleinberg
Oxford Internet Institute, University of Oxford, Oxford, UK
Helen Margetts
Public Policy Programme, The Alan Turing Institute, London, UK
Helen Margetts
Booth School of Business, University of Chicago, Chicago, IL, USA
Sendhil Mullainathan
Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
Simine Vazire
Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, USA
Alessandro Vespignani
Department of Psychology, University of Texas at Austin, Austin, TX, USA
Tal Yarkoni
- Jake M. Hofman
Search author on:PubMed Google Scholar
- Duncan J. Watts
Search author on:PubMed Google Scholar
- Susan Athey
Search author on:PubMed Google Scholar
- Filiz Garip
Search author on:PubMed Google Scholar
- Thomas L. Griffiths
Search author on:PubMed Google Scholar
- Jon Kleinberg
Search author on:PubMed Google Scholar
- Helen Margetts
Search author on:PubMed Google Scholar
- Sendhil Mullainathan
Search author on:PubMed Google Scholar
- Matthew J. Salganik
Search author on:PubMed Google Scholar
- Simine Vazire
Search author on:PubMed Google Scholar
- Alessandro Vespignani
Search author on:PubMed Google Scholar
- Tal Yarkoni
Search author on:PubMed Google Scholar
Contributions
J.M.H. and D.J.W. conceptualized and helped to write and prepare the manuscript. They contributed equally to these efforts. All authors were involved in and discussed the structure of the manuscript at various stages of its development.
Corresponding authors
Correspondence toJake M. Hofman orDuncan J. Watts.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review informationNature thanks Noortje Marres, Melanie Mitchell and Scott Page for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hofman, J.M., Watts, D.J., Athey, S.et al. Integrating explanation and prediction in computational social science.Nature595, 181–188 (2021). https://doi.org/10.1038/s41586-021-03659-0
Received:
Accepted:
Published:
Version of record:
Issue date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
Predicting human decisions with behavioural theories and machine learning
- Ori Plonsky
- Reut Apel
- Ido Erev
Nature Human Behaviour (2025)
Challenges and opportunities for digital twins in precision medicine from a complex systems perspective
- Manlio De Domenico
- Luca Allegri
- Francesco Zambelli
npj Digital Medicine (2025)
Large language models predict cognition and education close to or better than genomics or expert assessment
- Tobias Wolfram
Communications Psychology (2025)
Climate variation and serotype competition drive dengue outbreak dynamics in Singapore
- Emilie Finch
- Chia-chen Chang
- Rachel Lowe
Nature Communications (2025)
Early identification of dropouts during the special forces selection program
- Ruud J. R. den Hartigh
- Rik Huijzer
- Peter de Jonge
Scientific Reports (2025)


