Movatterモバイル変換

Part of the book series:Communications in Computer and Information Science ((CCIS,volume 1905))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

168Accesses

Abstract

Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available athttps://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 12583; Price includes VAT (Japan)

Softcover Book: JPY 10724; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Leveraging peer-review aspects for extractive and abstractive summarization of scientific articles

ArticleOpen access18 October 2024

Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge

Article28 February 2019

Approaches based on language models for aspect extraction for sentiment analysis in the Portuguese language

Article03 August 2024

Notes

1.
Population/Problem (P), Intervention (I), Comparison (C) and Outcome (O).
2.
The texts are originally in Russian and were translated into English only to provide examples in the paper.
3.
https://huggingface.co/cointegrated/rubert-tiny2.

References

Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 546–555. Association for Computational Linguistics (2017)
Google Scholar
Batura, T., Bakiyeva, A., Charintseva, M.: A method for automatic text summarization based on rhetorical analysis and topic modeling. Int. J. Comput.19(1), 118–127 (2020)
Article Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3615–3620. Association for Computational Linguistics (2019)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. (2009)
Google Scholar
Blinov, P., Reshetnikova, A., Nesterov, A., Zubkova, G., Kokh, V.: Rumedbench: a Russian medical language understanding benchmark. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds) AIME 2022. LNCS, vol. 13263, pp. 383–392. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-09342-5_38
Boudin, F., Nie, J.Y., Bartlett, J.C., Grad, R., Pluye, P., Dawes, M.: Combining classifiers for robust PICO element detection. BMC Med. Inform. Decis. Mak.10(1), 1–6 (2010)
Article Google Scholar
Bruches, E., Pauls, A., Batura, T., Isachenko, V.: Entity recognition and relation extraction from scientific and technical texts in Russian. In: 2020 Science and Artificial Intelligence Conference (SAI ence), pp. 41–45. IEEE (2020)
Google Scholar
Dernoncourt, F., Lee, J.Y.: PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 308–313, Taipei, Taiwan. Asian Federation of Natural Language Processing (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (2019)
Google Scholar
Dudchenko, A., Dudchenko, P., Ganzinger, M., Kopanitsa, G.D.: Extraction from medical records. In: pHealth, pp. 62–67 (2019)
Google Scholar
Gavrilov, D., Gusev, A., Korsakov, I., Novitsky, R., Serova, L.: Feature extraction method from electronic health records in Russia. In: Conference of Open Innovations Association, FRUCT, pp. 497–500. FRUCT Oy (2020)
Google Scholar
Gerasimenko, N., Chernyavsky, A., Nikiforova, M.: ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in Russian. Doklady Mathematics106, S95–S96 (2022)
Article Google Scholar
Gonçalves, S., Cortez, P., Moro, S.: A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Comput. Appl.32, 6793–6807 (2020)
Article Google Scholar
Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 1–9 (2011)
Google Scholar
Hassanzadeh, H., Groza, T., Hunter, J.: Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J. Biomed. Inform.49, 159–170 (2014)
Article Google Scholar
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017), to appear
Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc.12(3), 296–298 (2005)
Article Google Scholar
Huang, T.H.K., Huang, C.Y., Ding, C.K.C., Hsu, Y.C., Giles, C.L.: CODA-19: using a non-expert crowd to annotate research aspects on 10,000+ abstracts in the COVID-19 open research dataset. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics (2020)
Google Scholar
Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: SciREX: a challenge dataset for document-level information extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7506–7516. Association for Computational Linguistics (2020)
Google Scholar
Kim, S.N., Martinez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics, vol. 12, pp. 1–10. BioMed Central (2011)
Google Scholar
Kivotova, E., Maksudov, B., Kuleev, R., Ibragimov, B.: Extracting clinical information from chest x-ray reports: a case study for Russian language. In: 2020 International Conference Nonlinearity, Information and Robotics (NIR), pp. 1–6. IEEE (2020)
Google Scholar
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-26123-2_31
Chapter Google Scholar
Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. In: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference «Dialog», Moscow, May 29 – June 1, 2019, Proceedings, pp. 333–339 (2019)
Google Scholar
Loukachevitch, N., et al.: Nerel-bio: a dataset of biomedical abstracts annotated with nested named entities. Bioinformatics39(4), btad161 (2023)
Google Scholar
Miftahutdinov, Z., Alimova, I., Tutubalina, E.: On biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 281–288. Springer, Cham (2020).https://doi.org/10.1007/978-3-030-45442-5_35
Chapter Google Scholar
Nasar, Z., Jaffry, S.W., Malik, M.K.: Information extraction from scientific articles: a survey. Scientometrics117, 1931–1990 (2018)
Article Google Scholar
Nesterov, A., et al.: RuCCoN: clinical concept normalization in Russian. In: Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, pp. 239–245. Association for Computational Linguistics (2022)
Google Scholar
Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses. Wiley, New York (1989)
Google Scholar
Ronzano, F., Saggion, H.: Dr. inventor framework: extracting structured information from scientific publications. In: Japkowicz, N., Matwin, S. (eds.) DS 2015. LNCS (LNAI), vol. 9356, pp. 209–220. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-24282-8_18
Chapter Google Scholar
Shang, X., Ma, Q., Lin, Z., Yan, J., Chen, Z.: A span-based dynamic local attention model for sequential sentence classification. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 198–203. Association for Computational Linguistics (2021)
Google Scholar
Shelmanov, A., Smirnov, I., Vishneva, E.: Information extraction from clinical texts in Russian. In: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference «Dialog», Moscow, May 27–30, 2015, Proceedings, pp. 560–572 (2015)
Google Scholar
Sirotina, A., Loukachevitch, N.: Named entity recognition in information security domain for Russian. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1114–1120 (2019)
Google Scholar
Skvortsova, I.A.: Russian language among the world languages. In: VIII Vinogradov Conference, pp. 171–173 (2022)
Google Scholar
Teufel, S., et al.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, Citeseer (1999)
Google Scholar
Tikhomirov, M., Loukachevitch, N., Sirotina, A., Dobrov, B.: Using BERT and augmentation in named entity recognition for cybersecurity domain. In: Métais, E., Meziane, F., Horacek, H., Cimiano, P. (eds.) NLDB 2020. LNCS, vol. 12089, pp. 16–24. Springer, Cham (2020).https://doi.org/10.1007/978-3-030-51310-8_2
Chapter Google Scholar
Yamada, K., Hirao, T., Sasano, R., Takeda, K., Nagata, M.: Sequential span classification with neural semi-Markov CRFs for biomedical abstracts. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 871–877. Association for Computational Linguistics, Online (2020)
Google Scholar
Zhang, C., Xiang, Y., Hao, W., Li, Z., Qian, Y., Wang, Y.: Automatic recognition and classification of future work sentences from academic articles in a specific domain. J. Informet.17(1), 101373 (2023)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Novosibirsk State University, Novosibirsk, Russia
Anna Marshalova & Elena Bruches
A. P. Ershov Institute of Informatics Systems, Novosibirsk, Russia
Elena Bruches & Tatiana Batura

Authors

Anna Marshalova
View author publications
You can also search for this author inPubMed Google Scholar
Elena Bruches
View author publications
You can also search for this author inPubMed Google Scholar
Tatiana Batura
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toAnna Marshalova.

Editor information

Editors and Affiliations

National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovskii Institute of Mathematics and Mechanics of Russian Academy of Sciences, Yekaterinburg, Russia
Michael Khachay
University of Oslo, Oslo, Norway
Andrey Kutuzov
American University of Armenia, Yerevan, Armenia
Habet Madoyan
Artificial Intelligence Research Institute, Moscow, Russia
Ilya Makarov
Universität Hamburg, Hamburg, Germany
Irina Nikishina
Skolkovo Institute of Science and Technology, Moscow, Russia
Alexander Panchenko
Mohamed bin Zayed University of Artificial Intelligence and Technology Innovation Institute, Abu Dhabi, United Arab Emirates
Maxim Panov
Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Apptek, Aachen, Nordrhein-Westfalen, Germany
Evgenii Tsymbalov
Kazan Federal University and HSE University, Moscow, Russia
Elena Tutubalina
MTS AI, Moscow, Russia
Sergey Zagoruyko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marshalova, A., Bruches, E., Batura, T. (2024). Automatic Aspect Extraction from Scientific Texts. In: Ignatov, D.I.,et al. Recent Trends in Analysis of Images, Social Networks and Texts. AIST 2023. Communications in Computer and Information Science, vol 1905. Springer, Cham. https://doi.org/10.1007/978-3-031-67008-4_6

Download citation

DOI:https://doi.org/10.1007/978-3-031-67008-4_6
Published:30 July 2024
Publisher Name:Springer, Cham
Print ISBN:978-3-031-67007-7
Online ISBN:978-3-031-67008-4
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Automatic Aspect Extraction from Scientific Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging peer-review aspects for extractive and abstractive summarization of scientific articles

Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge

Approaches based on language models for aspect extraction for sentiment analysis in the Portuguese language

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now