Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature
  • Perspective
  • Published:

Integrating explanation and prediction in computational social science

Naturevolume 595pages181–188 (2021)Cite this article

Subjects

Abstract

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions—the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes—and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.

This is a preview of subscription content,access via your institution

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

9,800 Yen / 30 days

cancel any time

Subscription info for Japanese customers

We have a dedicated website for our Japanese customers. Please go tonatureasia.com to subscribe to this journal.

Buy this article

  • Purchase on SpringerLink
  • Instant access to the full article PDF.

¥ 4,980

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Watts, D. J. A twenty-first century science.Nature445, 489 (2007).

    CAS PubMed ADS  Google Scholar 

  2. Lazer, D. et al. Computational social science.Science323, 721–723 (2009).

    CAS PubMed PubMed Central  Google Scholar 

  3. Salganik, M. J.Bit by Bit: Social Research in the Digital Age (Princeton Univ. Press, 2018).

  4. Lazer, D. M. J. et al. Computational social science: obstacles and opportunities.Science369, 1060–1062 (2020).

    CAS PubMed ADS  Google Scholar 

  5. Lazer, D. et al. Meaningful measures of human society in the twenty-first century.Naturehttps://doi.org/10.1038/s41586-021-03660-7 (2021).

  6. Wing, J. M. Computational thinking.Commun. ACM49, 33–35 (2006).

    Google Scholar 

  7. Hedström, P. & Ylikoski, P. Causal mechanisms in the social sciences.Annu. Rev. Sociol.36, 49–67 (2010).

    Google Scholar 

  8. Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the author).Stat. Sci.16, 199–231 (2001).We view our paper as an extension of Brieman’s dichotomy (the ‘algorithmic’ and ‘data modelling’ cultures), arguing that these approaches should be integrated.

    MATH  Google Scholar 

  9. Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach.J. Econ. Perspect.31, 87–106 (2017).This paper explores the relationships between predictive models and causal inference.

    Google Scholar 

  10. Molina, M. & Garip, F. Machine learning for sociology.Annu. Rev. Sociol.45, 27–45 (2019).

  11. Shmueli, G. To explain or to predict?Stat. Sci.25, 289–310 (2010).We build on Schmueli’s distinction between prediction and explanation and propose a framework for integrating the two approaches.

    MathSciNet MATH  Google Scholar 

  12. Agrawal, M., Peterson, J. C. & Griffiths, T. L. Scaling up psychology via Scientific Regret Minimization.Proc. Natl Acad. Sci. USA117, 8825–8835 (2020). This paper exemplifies what we call integrative modelling.

    CAS PubMed PubMed Central  Google Scholar 

  13. Munafò, M. R. et al. A manifesto for reproducible science.Nat. Hum. Behav.1, 0021 (2017).

    PubMed PubMed Central  Google Scholar 

  14. Yarkoni, T. The generalizability crisis.Behav. Brain Sci.https://doi.org/10.1017/S0140525X20001685 (2020).

  15. Ward, M. D., Greenhill, B. D. & Bakke, K. M. The perils of policy by p-value: predicting civil conflicts.J. Peace Res.47, 363–375 (2010).

    Google Scholar 

  16. Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning.Perspect. Psychol. Sci.12, 1100–1122 (2017).

    PubMed PubMed Central  Google Scholar 

  17. Watts, D. J. Should social science be more solution-oriented?Nat. Hum. Behav.1, 0015 (2017).

    Google Scholar 

  18. Berkman, E. T. & Wilson, S. M. So useful as a good theory? The practicality crisis in (social) psychological theory.Perspect. Psychol. Sci.https://doi.org/10.1177/1745691620969650 (2021).

  19. Athey, S. Beyond prediction: Using big data for policy problems.Science355, 483–485 (2017).

    CAS PubMed ADS  Google Scholar 

  20. Lipton, Z. C. The mythos of model interpretability.Queue16, 31–57 (2018).

    Google Scholar 

  21. Kleinberg, J., Ludwig, J., Mullainathan, S. & Sunstein, C. R. Discrimination in the age of algorithms.J. Legal Anal.10, 113–174 (2018).

    Google Scholar 

  22. Coveney, P. V., Dougherty, E. R. & Highfield, R. R. Big data need big theory too.Philos. Trans. R. Soc. A374, 20160153 (2016).

    ADS  Google Scholar 

  23. Gigerenzer, G. Mindless statistics.J. Socio-Econ.33, 587–606 (2004).

    Google Scholar 

  24. Cohen, J. The earth is round (p < .05).Am. Psychol.49, 997–1003 (1994).

    Google Scholar 

  25. Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination.Am. Econ. Rev.94, 991–1013 (2004).

    Google Scholar 

  26. Ioannidis, J. P. A. Why most published research findings are false.PLoS Med.2, e124 (2005).

    PubMed PubMed Central  Google Scholar 

  27. Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.Psychol. Sci.22, 1359–1366 (2011).

    PubMed  Google Scholar 

  28. Open Science Collaboration. Estimating the reproducibility of psychological science.Science349, aac4716 (2015).

    Google Scholar 

  29. Meehl, P. E. Why summaries of research on psychological theories are often uninterpretable.Psychol. Rep.66, 195–244 (1990).

    Google Scholar 

  30. Gelman, A. Causality and statistical learning.Am. J. Sociol.117, 955–966 (2011).

    Google Scholar 

  31. Dienes, Z.Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference (Macmillan, 2008).

  32. Schrodt, P. A. Seven deadly sins of contemporary quantitative political analysis.J. Peace Res.51, 287–300 (2014).

    Google Scholar 

  33. Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google flu: traps in big data analysis.Science343, 1203–1205 (2014).

    CAS PubMed ADS  Google Scholar 

  34. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations.Science366, 447–453 (2019).

    CAS PubMed ADS  Google Scholar 

  35. Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M. & Watts, D. J. Predicting consumer behavior with web search.Proc. Natl Acad. Sci. USA107, 17486–17490 (2010).

    CAS PubMed PubMed Central ADS  Google Scholar 

  36. Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems.Science355, 486–488 (2017).

    CAS PubMed ADS  Google Scholar 

  37. Case, A. & Deaton, A. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century.Proc. Natl Acad. Sci. USA112, 15078–15083 (2015).

    CAS PubMed PubMed Central ADS  Google Scholar 

  38. Oliver, M. L., Shapiro, T. M. & Shapiro, T.Black Wealth, White Wealth: A New Perspective on Racial Inequality (Taylor & Francis, 2006).

  39. Chetty, R., Hendren, N., Kline, P. & Saez, E. Where is the land of opportunity? The geography of intergenerational mobility in the United States.Q. J. Econ.129, 1553–1623 (2014).

    Google Scholar 

  40. Wagner, C. et al. Measuring algorithmically infused societies.Naturehttps://doi.org/10.1038/s41586-021-03666-1 (2021).

  41. Ba, B. A., Knox, D., Mummolo, J. & Rivera, R. The role of officer race and gender in police–civilian interactions in Chicago.Science371, 696–702 (2021).

    CAS PubMed ADS  Google Scholar 

  42. Provost, F. & Fawcett, T.Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).

  43. Makridakis, S., Wheelwright, S. C. & Hyndman, R. J.Forecasting Methods and Applications (Wiley, 1998).

  44. Tetlock, P. E.Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).

  45. Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems.Am. Econ. Rev.105, 491–495 (2015).

    PubMed PubMed Central  Google Scholar 

  46. Dowding, K. & Miller, C. On prediction in political science.Eur. J. Polit. Res.58, 1001–1018 (2019).

    Google Scholar 

  47. Galesic, M. et al. Human social sensing is an untapped resource for computational social science.Naturehttps://doi.org/10.1038/s41586-021-03649-2 (2021).

    Article PubMed  Google Scholar 

  48. Cheng, J., Adamic, L., Dow, P. A., Kleinberg, J. M. & Leskovec, J. Can cascades be predicted? InWWW '14: Proc. 23rd International Conference on World Wide Web 925–936 (2014).

  49. Pearl, J. The seven tools of causal inference, with reflections on machine learning.Commun. ACM62, 54–60 (2019).This paper outlines the need for causal thinking in building predictive models.

    Google Scholar 

  50. Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration.Proc. Natl Acad. Sci. USA117, 8398–8403 (2020).

    CAS PubMed PubMed Central  Google Scholar 

  51. Fudenberg, D., Kleinberg, J., Liang, A. & Mullainathan, S. Measuring the completeness of theories.SSRNhttps://doi.org/10.2139/ssrn.3018785 (2019).

  52. Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. InWWW '16: Proc 25th International Conference on World Wide Web 683–694 (2016).

  53. Watts, D. J. Common sense and sociological explanations.Am. J. Sociol.120, 313–351 (2014).This paper argues that sociologists should pay more attention to prediction versus interpretability when evaluating their explanations.

    Google Scholar 

  54. Zhou, F., Xu, X., Trajcevski, G. & Zhang, K. A survey of information cascade analysis: models, predictions, and recent advances.ACM Comput. Surv.54, 1–36 (2021).

    Google Scholar 

  55. Goel, S., Watts, D. J. & Goldstein, D. G. The structure of online diffusion networks. InEC '12: Proc. 13th ACM Conference on Electronic Commerce (2012).

  56. Wu, S., Hofman, J. M., Mason, W. A. & Watts, D. J. Who says what to whom on Twitter. InWWW’11: Proc 20th International Conference on World Wide Web 705–714 (2011).

  57. Goel, S., Anderson, A., Hofman, J. & Watts, D. J. The structural virality of online diffusion.Manage. Sci.62, 180–196 (2015).

    Google Scholar 

  58. Berger, J. & Milkman, K. L. What makes online content viral?J. Mark. Res.49, 192–205 (2012).

    Google Scholar 

  59. Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. InWSDM '11: Proc. Fourth ACM International Conference on Web Search and Data Mining 65–74 (2011).

  60. Tan, C., Lee, L. & Pang, B. The effect of wording on message propagation: topic- and author-controlled natural experiments on Twitter. InProc. 52nd Annual Meeting of the Association for Computational Linguistics 175–185 (2014).

  61. Liu, T., Ungar, L. & Kording, K. Quantifying causality in data science with quasi-experiments.Nat. Comput. Sci.1, 24–32 (2021).

    Google Scholar 

  62. Hochberg, I. et al. Encouraging physical activity in patients with diabetes through automatic personalized feedback via reinforcement learning improves glycemic control.Diabetes Care39, e59–e60 (2016).

    PubMed  Google Scholar 

  63. Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects.Proc. Natl Acad. Sci. USA113, 7353–7360 (2016).

    MathSciNet CAS PubMed PubMed Central MATH  Google Scholar 

  64. Charles, D., Chickering, M. & Simard, P. Counterfactual reasoning and learning systems: the example of computational advertising.J. Mach. Learn. Res.14, 3207–3260 (2013).

    MathSciNet MATH  Google Scholar 

  65. Low, H. & Meghir, C. The use of structural models in econometrics.J. Econ. Perspect.31, 33–58 (2017).

    Google Scholar 

  66. Athey, S., Levin, J. & Seira, E. Comparing open and sealed bid auctions: evidence from timber auctions*.Q. J. Econ.126, 207–257 (2011).

    Google Scholar 

  67. Awad, E. et al. The Moral Machine experiment.Nature563, 59–64 (2018).

    CAS PubMed ADS  Google Scholar 

  68. Aczel, B. et al. A consensus-based transparency checklist.Nat. Hum. Behav.4, 4–6 (2020).

    PubMed  Google Scholar 

  69. Kidwell, M. C. et al. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency.PLoS Biol.14, e1002456 (2016).

    PubMed PubMed Central  Google Scholar 

  70. Nosek, B. A. et al. Promoting an open research culture.Science348, 1422–1425 (2015).

    CAS PubMed PubMed Central ADS  Google Scholar 

  71. Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution.Proc. Natl Acad. Sci. USA115, 2600–2606 (2018).

    CAS PubMed PubMed Central  Google Scholar 

  72. Donoho, D. 50 years of data science.J. Comput. Graph. Stat.26, 745–766 (2017).

    MathSciNet  Google Scholar 

  73. Gelman, A. & Loken, E. The statistical crisis in science.Am. Sci.102, 460 (2014).

    Google Scholar 

  74. Rao, R. B., Fung, G. & Rosales, R. On the dangers of cross-validation. An experimental evaluation. InProc. 2008 SIAM International Conference on Data Mining 588–596 (Society for Industrial and Applied Mathematics, 2008).

  75. Dwork, C. et al. The reusable holdout: preserving validity in adaptive data analysis.Science349, 636–638 (2015).

    MathSciNet CAS PubMed MATH ADS  Google Scholar 

  76. Chambers, C. D. Registered reports: a new publishing initiative atCortex. Cortex49, 609–610 (2013).

    PubMed  Google Scholar 

  77. Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of published reports.Soc. Psychol.45, 137–141 (2014).

    Google Scholar 

  78. Bennett, J. & Lanning, S. The Netflix Prize. InProc. KDD Cup and Workshop 2007 (2007).

  79. Dorie, V., Hill, J., Shalit, U., Scott, M. & Cervone, D. Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition.SSO Schweiz. Monatsschr. Zahnheilkd.34, 43–68 (2019).

    MathSciNet MATH  Google Scholar 

  80. Lin, A., Merchant, A., Sarkar, S. K. & D’Amour, A. Universal causal evaluation engine: an API for empirically evaluating causal inference models. inProc. Machine Learning Research (eds Le, T. D. et al.) Vol. 104, 50–58 (PMLR, 2019).

  81. Craver, C. F.Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience (Clarendon, 2007).

  82. Salganik, M. J., Lundberg, I., Kindel, A. T. & McLanahan, S. Introduction to the special collection on the Fragile Families Challenge.Sociushttps://doi.org/10.1177/2378023119871580 (2019).

  83. Strathern, M. ‘Improving ratings’: audit in the British university system.Eur. Rev.5, 305–321 (1997).

    Google Scholar 

  84. Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover new theories of human decision-making.Science 372, 1209–1214 (2021).

Download references

Author information

Author notes
  1. These authors contributed equally: Jake M. Hofman, Duncan J. Watts

Authors and Affiliations

  1. Microsoft Research, New York, NY, USA

    Jake M. Hofman

  2. Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA

    Duncan J. Watts

  3. The Annenberg School of Communication, University of Pennsylvania, Philadelphia, PA, USA

    Duncan J. Watts

  4. Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA

    Duncan J. Watts

  5. Graduate School of Business, Stanford University, Stanford, CA, USA

    Susan Athey

  6. Department of Sociology, Princeton University, Princeton, NJ, USA

    Filiz Garip & Matthew J. Salganik

  7. Department of Psychology, Princeton University, Princeton, NJ, USA

    Thomas L. Griffiths

  8. Department of Computer Science, Princeton University, Princeton, NJ, USA

    Thomas L. Griffiths

  9. Department of Computer Science, Cornell University, Ithaca, NY, USA

    Jon Kleinberg

  10. Department of Information Science, Cornell University, Ithaca, NY, USA

    Jon Kleinberg

  11. Oxford Internet Institute, University of Oxford, Oxford, UK

    Helen Margetts

  12. Public Policy Programme, The Alan Turing Institute, London, UK

    Helen Margetts

  13. Booth School of Business, University of Chicago, Chicago, IL, USA

    Sendhil Mullainathan

  14. Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia

    Simine Vazire

  15. Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, USA

    Alessandro Vespignani

  16. Department of Psychology, University of Texas at Austin, Austin, TX, USA

    Tal Yarkoni

Authors
  1. Jake M. Hofman
  2. Duncan J. Watts
  3. Susan Athey
  4. Filiz Garip
  5. Thomas L. Griffiths
  6. Jon Kleinberg
  7. Helen Margetts
  8. Sendhil Mullainathan
  9. Matthew J. Salganik
  10. Simine Vazire
  11. Alessandro Vespignani
  12. Tal Yarkoni

Contributions

J.M.H. and D.J.W. conceptualized and helped to write and prepare the manuscript. They contributed equally to these efforts. All authors were involved in and discussed the structure of the manuscript at various stages of its development.

Corresponding authors

Correspondence toJake M. Hofman orDuncan J. Watts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature thanks Noortje Marres, Melanie Mitchell and Scott Page for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hofman, J.M., Watts, D.J., Athey, S.et al. Integrating explanation and prediction in computational social science.Nature595, 181–188 (2021). https://doi.org/10.1038/s41586-021-03659-0

Download citation

This article is cited by

Access through your institution
Buy or subscribe

Associated content

Special

Computational social science

Advertisement

Search

Advanced search

Quick links

Nature Briefing

Sign up for theNature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox.Sign up for Nature Briefing

[8]ページ先頭

©2009-2025 Movatter.jp