935Accesses
9Citations
3Altmetric
Abstract
This study intends to describe the diachronic changes of linguistic complexity (i.e., overall, morphological, and syntactic complexity) in scientific writing based on Kolmogorov complexity, an information-theoretic approach. We have chosen the entire data (i.e., all the 24 text types including articles, letters, news, etc.) and two individual registers (i.e., the full texts and abstracts of articles) ofPhilosophical Transactions of the Royal Society of London, the world’s oldest scientific writing journal. The Mann–Kendall trend tests were used to capture diachronic changes in linguistic complexity at three complexity levels, and the Pearson correlation coefficients were calculated to investigate the relationships between the three complexity metrics. Results showed that the overall and morphological complexity of both the entire data and full texts increased from 1821 to 1920, indicating a massive lexical expansion during this 100-year period, as evidenced by more and more word form variants in scientific writing. In contrast, the syntactic complexity of the entire data and full texts declined, suggesting a gradual shift towards grammatical simplification in the evolution of scientific writing, particularly in word order rules and syntactic patterns. A trade-off effect has also been found between syntactic and morphological complexity in the entire data. In addition, concerning abstracts, the overall and morphological complexity decreased while the syntactic complexity increased. Drawing from these results, researchers can better understand the changing linguistic complexity styles in scientific writing, thus making adjustments in their writing accordingly to garner greater attention in academia.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.





Similar content being viewed by others
Data availability
All data supporting the conclusions of this article are included within the article (and its additional files).
References
Akmajian, A., Farmer, A. K., Bickmore, L., Demers, R. A., & Harnish, R. M. (2017).Linguistics: An introduction to language and communication. The MIT Press.
Atkinson, D. (1998).Scientific discourse in sociohistorical context: The Philosophical Transactions of the Royal Society of London, 1675–1975. Routledge.
Bakker, D. (1998). Flexibility and consistency in word order patterns in the languages of Europe. In A. Siewierska (Ed.),Constituent order in the languages of Europe (pp. 383–420). De Gruyter Mouton.https://doi.org/10.1515/9783110812206.383
Bentz, C., & Berdicevskis, A. (2016, December 1).Learning pressures reduce morphological complexity: Linking corpus, computational and experimental evidence. ACLWeb; The COLING 2016 Organizing Committee.http://www.aclweb.org/anthology/W16-4125
Bentz, C., Ruzsics, T., Koplenig, A., & Samardžić, T. (2016, December 1).A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora. ACLWeb; The COLING 2016 Organizing Committee.http://www.aclweb.org/anthology/W16-4117
Biber, D., & Gray, B. (2016).Grammatical complexity in academic English Linguistic change in writing. Cambridge University Press.
Biber, D., Gray, B., & Staples, S. (2014). Predicting patterns of grammatical complexity across language exam task types and proficiency levels.Applied Linguistics,37(5), 639–668.https://doi.org/10.1093/applin/amu059
Bizzoni, Y., Degaetano-Ortlieb, S., Fankhauser, P., & Teich, E. (2020). Linguistic variation and change in 250 years of English scientific writing: A data-driven approach.Frontiers in Artificial Intelligence,3, 73.https://doi.org/10.3389/frai.2020.00073
Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity. InDimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 23–46). John Benjamins.
Casadevall, A., & Fang, F. C. (2014). Specialized science.Infection and Immunity,82(4), 1355–1360.
Chen, B., Deng, D., Zhong, Z., & Zhang, C. (2020). Exploring linguistic characteristics of highly browsed and downloaded academic articles.Scientometrics,122(3), 1769–1790.https://doi.org/10.1007/s11192-020-03361-4
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian Knot: The moving-average type-token ratio (MATTR).Journal of Quantitative Linguistics,17(2), 94–100.https://doi.org/10.1080/09296171003643098
Cvrček, V., & Chlumská, L. (2015). Simplification in translated Czech: A new approach to type-token ratio.Russian Linguistics,39(3), 309–325.https://doi.org/10.1007/s11185-015-9151-8
Degaetano-Ortlieb, S., Kermes, H., Khamis, A., & Teich, E. (2018). An information-theoretic approach to modeling diachronic change in scientific English. InFrom data to evidence in English language research (pp. 258–281). Brill.
Degaetano-Ortlieb, S., & Teich, E. (2018). Using relative entropy for detection and analysis of periods of diachronic linguistic change. InProceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 22–33.
Degaetano-Ortlieb, S., & Teich, E. (2019). Toward an optimal code for communication: The case of scientific English.Corpus Linguistics and Linguistic Theory.https://doi.org/10.1515/cllt-2018-0088
Der, V. (1997).Information theory. Cambridge University Press.
Ehret, K. (2014). Kolmogorov complexity of morphs and constructions in English.Linguistic Issues in Language Technology.https://doi.org/10.33011/lilt.v11i.1363
Ehret, K. (2017).An information-theoretic approach to language complexity: Variation in naturalistic corpora. Doctoral dissertation. Freiburg im Breis: University of Freiburg.
Ehret, K. (2021). An information-theoretic view on language complexity and register variation: Compressing naturalistic corpus data.Corpus Linguistics and Linguistic Theory,17(2), 383–410.https://doi.org/10.1515/cllt-2018-0033
Ehret, K., & Szmrecsanyi, B. (2016). An information-theoretic approach to assess linguistic complexity. In R. Baechler & G. Seiler (Eds.),Complexity, isolation, and variation (pp. 71–94). de Gruyter.
Ehret, K., & Szmrecsanyi, B. (2019). Compressing learner language: An information-theoretic measure of complexity in SLA production data.Second Language Research,35(1), 23–45.https://doi.org/10.1177/0267658316669559
Ehret, K., & Taboada, M. (2021). The interplay of complexity and subjectivity in opinionated discourse.Discourse Studies,23(2), 141–165.https://doi.org/10.1177/1461445620966923
Fortson, B. W. (2010).Indo-European language and culture: An introduction. Wiley-Blackwell.
Gross, A. G., Harmon, J. E., & Reidy, M. (2002).Communicating science: The scientific article from the 17th century to the present. Oxford University Press.
Hawkins, J. A. (2009).An efficiency theory of complexity and related phenomena. Oxford University Press.
Houghton, B. (1975).Scientific periodicals: Their historical development, characteristics and control. Bingley.
Hundt, M., & Mair, C. (1999). Agile” and “uptight” genres.International Journal of Corpus Linguistics,4(2), 221–242.https://doi.org/10.1075/ijcl.4.2.02hun
Hyland, K., & Jiang, F. (2017). Is academic writing becoming more informal?English for Specific Purposes,45, 40–51.https://doi.org/10.1016/j.esp.2016.09.001
Juola, P. (1998). Measuring linguistic complexity: The morphological tier.Journal of Quantitative Linguistics,5(3), 206–213.https://doi.org/10.1080/09296179808590128
Juola, P. (2008). Assessing linguistic complexity. InLanguage Complexity: Typology, contact, change (pp. 89–108). John Benjamins Publishing.https://doi.org/10.1075/slcs.94.07juo
Juzek, T. S., Krielke, M.-P., & Teich, E. (2020). Exploring diachronic syntactic shifts with dependency length: the case of scientific English. InProceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), 109–119.
Kendall, M. G. (1955).Rank correlation methods second edition, revised and enlarged. Charles Griffin & Co.
Kolmogorov, A. N. (1968). Three approaches to the quantitative definition of information*.International Journal of Computer Mathematics,2(1–4), 157–168.https://doi.org/10.1080/00207166808803030
Koplenig, A., Meyer, P., Wolfer, S., & Müller-Spitzer, C. (2017). The statistical trade-off between word order and word structure: Large-scale evidence for the principle of least effort.PLoS ONE,12(3), e0173614.https://doi.org/10.1371/journal.pone.0173614
Kusters, W. (2003).Linguistic complexity: the influence of social change on verbal inflection. Lot.
Kusters, W. (2008). Complexity in linguistic theory, language learning and language change. InLanguage complexity: Typology, contact, change (pp. 3–22). John Benjamins.https://www.jbe-platform.com/content/books/9789027291356-slcs.94.03kus
Li, M., Chen, X., Li, X., Ma, B., & Vitanyi, P. M. B. (2004). The similarity metric.IEEE Transactions on Information Theory,50(12), 3250–3264.https://doi.org/10.1109/tit.2004.838101
Lu, C., Bu, Y., Dong, X., Wang, J., Ding, Y., Larivière, V., Sugimoto, C. R., Paul, L., & Zhang, C. (2019a). Analyzing linguistic complexity and scientific impact.Journal of Informetrics,13(3), 817–829.https://doi.org/10.1016/j.joi.2019.07.004
Lu, C., Bu, Y., Wang, J., Ding, Y., Torvik, V., Schnaars, M., & Zhang, C. (2019b). Examining scientific writing styles from the perspective of linguistic complexity.Journal of the Association for Information Science and Technology,70(5), 462–475.https://doi.org/10.1002/asi.24126
Mack, C. (2015). 350 years of scientific journals.Journal of Micro/nanolithography, MEMS, and MOEMS,14(1), 010101.https://doi.org/10.1117/1.jmm.14.1.010101
Mann, H. B. (1945). Nonparametric tests against trend.Econometrica,13(3), 245.https://doi.org/10.2307/1907187
McWhorter, J. H. (2001). The worlds simplest grammars are creole grammars.Linguistic Typol.,5, 2–3.https://doi.org/10.1515/lity.2001.001
Menzel, K., Knappen, J., & Teich, E. (2021). Generating linguistically relevant metadata for the Royal Society Corpus.Research in Corpus Linguistics,9(1), 1–18.https://doi.org/10.32714/ricl.09.01.02
Miestamo, M. (2004). On the feasibility of complexity metrics.FinEst Linguistics, Proceedings of the Annual Finnish and Estonian Conference of Linguistics, Tallinn, 11–26.
Nichols, J. (2013). The vertical archipelago: Adding the third dimension to linguistic geography. InSpace in Language and Linguistics (pp. 38–60). De Gruyter.
Nichols, J. (2016). Complex edges, transparent frontiers: Grammatical complexity and language spreads. InComplexity, isolation, and variation (pp. 117–138). de Gruyter.
Pitkin, R. M. (1999). Accuracy of data in abstracts of published research articles.The Journal of the American Medical Association,281(12), 1110.https://doi.org/10.1001/jama.281.12.1110
Sadeniemi, M., Kettunen, K., Lindh-Knuutila, T., & Honkela, T. (2008). Complexity of European Union Languages: A comparative approach∗.Journal of Quantitative Linguistics,15(2), 185–211.https://doi.org/10.1080/09296170801961843
Shannon, C. E. (1948). A mathematical theory of communication.Bell System Technical Journal,27(4), 623–656.https://doi.org/10.1002/j.1538-7305.1948.tb00917.x
Steger, M., & Schneider, E. W. (2012). Complexity as a function of iconicity: The case of complement clause constructions in New Englishes. In B. Kortmann & B. Szmrecsanyi (Eds.),Linguistic complexity: Second language acquisition, indigenization, contact (pp. 156–191). De Gruyter.https://doi.org/10.1515/9783110229226.156
Sun, K., Liu, H., & Xiong, W. (2021). The evolutionary pattern of language in scientific writings: A case study of philosophical transactions of royal society (1665–1869).Scientometrics,126(2), 1695–1724.https://doi.org/10.1007/s11192-020-03816-8
Ure, J. (1982). Introduction: Approaches to the study of register range.International Journal of the Sociology of Language,1982, 35.https://doi.org/10.1515/ijsl.1982.35.5
Wells, R. (1954). Archiving and language typology.International Journal of American Linguistics,20(2), 101–107.https://doi.org/10.1086/464260
Yan, J., & Liu, H. (2021). Morphology and word order in Slavic languages: Insights from annotated corpora.Voprosy Jazykoznanija,4, 131.https://doi.org/10.31857/0373-658x.2021.4.131-159
Funding
This research was supported by the National Social Science Foundation of China (No. 17BYY115).
Author information
Authors and Affiliations
Foreign Languages College, Shanghai Normal University, 100 Guilin Road, Xuhui, Shanghai, People’s Republic of China
Gui Wang, Hui Wang, Xinyi Sun & Li Wang
School of International Chinese Studies, Beijing Foreign Studies University, Beijing, People’s Republic of China
Nan Wang
- Gui Wang
You can also search for this author inPubMed Google Scholar
- Hui Wang
You can also search for this author inPubMed Google Scholar
- Xinyi Sun
You can also search for this author inPubMed Google Scholar
- Nan Wang
You can also search for this author inPubMed Google Scholar
- Li Wang
You can also search for this author inPubMed Google Scholar
Contributions
All authors contributed to the study conception and design. Material preparation and data collection were performed by SXY, and data processing was carried out by all authors. The first draft of the manuscript was written by WG, SXY, WN, and WH. All authors especially WL commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Correspondence toLi Wang.
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethical approval
This study did not involve humans and/or animals; there is no need for institutional ethics review board approval.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, G., Wang, H., Sun, X.et al. Linguistic complexity in scientific writing: A large-scale diachronic study from 1821 to 1920.Scientometrics128, 441–460 (2023). https://doi.org/10.1007/s11192-022-04550-z
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative