Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Automatic Aspect Extraction from Scientific Texts

  • Conference paper
  • First Online:

Abstract

Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available athttps://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10724
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Notes

  1. 1.

    Population/Problem (P), Intervention (I), Comparison (C) and Outcome (O).

  2. 2.

    The texts are originally in Russian and were translated into English only to provide examples in the paper.

  3. 3.

References

  1. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 546–555. Association for Computational Linguistics (2017)

    Google Scholar 

  2. Batura, T., Bakiyeva, A., Charintseva, M.: A method for automatic text summarization based on rhetorical analysis and topic modeling. Int. J. Comput.19(1), 118–127 (2020)

    Article  Google Scholar 

  3. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3615–3620. Association for Computational Linguistics (2019)

    Google Scholar 

  4. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. (2009)

    Google Scholar 

  5. Blinov, P., Reshetnikova, A., Nesterov, A., Zubkova, G., Kokh, V.: Rumedbench: a Russian medical language understanding benchmark. In: Michalowski, M., Abidi, S.S.R., Abidi, S. (eds) AIME 2022. LNCS, vol. 13263, pp. 383–392. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-09342-5_38

  6. Boudin, F., Nie, J.Y., Bartlett, J.C., Grad, R., Pluye, P., Dawes, M.: Combining classifiers for robust PICO element detection. BMC Med. Inform. Decis. Mak.10(1), 1–6 (2010)

    Article  Google Scholar 

  7. Bruches, E., Pauls, A., Batura, T., Isachenko, V.: Entity recognition and relation extraction from scientific and technical texts in Russian. In: 2020 Science and Artificial Intelligence Conference (SAI ence), pp. 41–45. IEEE (2020)

    Google Scholar 

  8. Dernoncourt, F., Lee, J.Y.: PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 308–313, Taipei, Taiwan. Asian Federation of Natural Language Processing (2017)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (2019)

    Google Scholar 

  10. Dudchenko, A., Dudchenko, P., Ganzinger, M., Kopanitsa, G.D.: Extraction from medical records. In: pHealth, pp. 62–67 (2019)

    Google Scholar 

  11. Gavrilov, D., Gusev, A., Korsakov, I., Novitsky, R., Serova, L.: Feature extraction method from electronic health records in Russia. In: Conference of Open Innovations Association, FRUCT, pp. 497–500. FRUCT Oy (2020)

    Google Scholar 

  12. Gerasimenko, N., Chernyavsky, A., Nikiforova, M.: ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in Russian. Doklady Mathematics106, S95–S96 (2022)

    Article  Google Scholar 

  13. Gonçalves, S., Cortez, P., Moro, S.: A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Comput. Appl.32, 6793–6807 (2020)

    Article  Google Scholar 

  14. Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 1–9 (2011)

    Google Scholar 

  15. Hassanzadeh, H., Groza, T., Hunter, J.: Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J. Biomed. Inform.49, 159–170 (2014)

    Article  Google Scholar 

  16. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017), to appear

    Google Scholar 

  17. Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc.12(3), 296–298 (2005)

    Article  Google Scholar 

  18. Huang, T.H.K., Huang, C.Y., Ding, C.K.C., Hsu, Y.C., Giles, C.L.: CODA-19: using a non-expert crowd to annotate research aspects on 10,000+ abstracts in the COVID-19 open research dataset. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics (2020)

    Google Scholar 

  19. Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: SciREX: a challenge dataset for document-level information extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7506–7516. Association for Computational Linguistics (2020)

    Google Scholar 

  20. Kim, S.N., Martinez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics, vol. 12, pp. 1–10. BioMed Central (2011)

    Google Scholar 

  21. Kivotova, E., Maksudov, B., Kuleev, R., Ibragimov, B.: Extracting clinical information from chest x-ray reports: a case study for Russian language. In: 2020 International Conference Nonlinearity, Information and Robotics (NIR), pp. 1–6. IEEE (2020)

    Google Scholar 

  22. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-26123-2_31

    Chapter  Google Scholar 

  23. Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. In: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference «Dialog», Moscow, May 29 – June 1, 2019, Proceedings, pp. 333–339 (2019)

    Google Scholar 

  24. Loukachevitch, N., et al.: Nerel-bio: a dataset of biomedical abstracts annotated with nested named entities. Bioinformatics39(4), btad161 (2023)

    Google Scholar 

  25. Miftahutdinov, Z., Alimova, I., Tutubalina, E.: On biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 281–288. Springer, Cham (2020).https://doi.org/10.1007/978-3-030-45442-5_35

    Chapter  Google Scholar 

  26. Nasar, Z., Jaffry, S.W., Malik, M.K.: Information extraction from scientific articles: a survey. Scientometrics117, 1931–1990 (2018)

    Article  Google Scholar 

  27. Nesterov, A., et al.: RuCCoN: clinical concept normalization in Russian. In: Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, pp. 239–245. Association for Computational Linguistics (2022)

    Google Scholar 

  28. Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses. Wiley, New York (1989)

    Google Scholar 

  29. Ronzano, F., Saggion, H.: Dr. inventor framework: extracting structured information from scientific publications. In: Japkowicz, N., Matwin, S. (eds.) DS 2015. LNCS (LNAI), vol. 9356, pp. 209–220. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-24282-8_18

    Chapter  Google Scholar 

  30. Shang, X., Ma, Q., Lin, Z., Yan, J., Chen, Z.: A span-based dynamic local attention model for sequential sentence classification. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 198–203. Association for Computational Linguistics (2021)

    Google Scholar 

  31. Shelmanov, A., Smirnov, I., Vishneva, E.: Information extraction from clinical texts in Russian. In: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference «Dialog», Moscow, May 27–30, 2015, Proceedings, pp. 560–572 (2015)

    Google Scholar 

  32. Sirotina, A., Loukachevitch, N.: Named entity recognition in information security domain for Russian. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1114–1120 (2019)

    Google Scholar 

  33. Skvortsova, I.A.: Russian language among the world languages. In: VIII Vinogradov Conference, pp. 171–173 (2022)

    Google Scholar 

  34. Teufel, S., et al.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, Citeseer (1999)

    Google Scholar 

  35. Tikhomirov, M., Loukachevitch, N., Sirotina, A., Dobrov, B.: Using BERT and augmentation in named entity recognition for cybersecurity domain. In: Métais, E., Meziane, F., Horacek, H., Cimiano, P. (eds.) NLDB 2020. LNCS, vol. 12089, pp. 16–24. Springer, Cham (2020).https://doi.org/10.1007/978-3-030-51310-8_2

    Chapter  Google Scholar 

  36. Yamada, K., Hirao, T., Sasano, R., Takeda, K., Nagata, M.: Sequential span classification with neural semi-Markov CRFs for biomedical abstracts. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 871–877. Association for Computational Linguistics, Online (2020)

    Google Scholar 

  37. Zhang, C., Xiang, Y., Hao, W., Li, Z., Qian, Y., Wang, Y.: Automatic recognition and classification of future work sentences from academic articles in a specific domain. J. Informet.17(1), 101373 (2023)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Novosibirsk State University, Novosibirsk, Russia

    Anna Marshalova & Elena Bruches

  2. A. P. Ershov Institute of Informatics Systems, Novosibirsk, Russia

    Elena Bruches & Tatiana Batura

Authors
  1. Anna Marshalova

    You can also search for this author inPubMed Google Scholar

  2. Elena Bruches

    You can also search for this author inPubMed Google Scholar

  3. Tatiana Batura

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toAnna Marshalova.

Editor information

Editors and Affiliations

  1. National Research University Higher School of Economics, Moscow, Russia

    Dmitry I. Ignatov

  2. Krasovskii Institute of Mathematics and Mechanics of Russian Academy of Sciences, Yekaterinburg, Russia

    Michael Khachay

  3. University of Oslo, Oslo, Norway

    Andrey Kutuzov

  4. American University of Armenia, Yerevan, Armenia

    Habet Madoyan

  5. Artificial Intelligence Research Institute, Moscow, Russia

    Ilya Makarov

  6. Universität Hamburg, Hamburg, Germany

    Irina Nikishina

  7. Skolkovo Institute of Science and Technology, Moscow, Russia

    Alexander Panchenko

  8. Mohamed bin Zayed University of Artificial Intelligence and Technology Innovation Institute, Abu Dhabi, United Arab Emirates

    Maxim Panov

  9. Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA

    Panos M. Pardalos

  10. National Research University Higher School of Economics, Nizhny Novgorod, Russia

    Andrey V. Savchenko

  11. Apptek, Aachen, Nordrhein-Westfalen, Germany

    Evgenii Tsymbalov

  12. Kazan Federal University and HSE University, Moscow, Russia

    Elena Tutubalina

  13. MTS AI, Moscow, Russia

    Sergey Zagoruyko

Rights and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marshalova, A., Bruches, E., Batura, T. (2024). Automatic Aspect Extraction from Scientific Texts. In: Ignatov, D.I.,et al. Recent Trends in Analysis of Images, Social Networks and Texts. AIST 2023. Communications in Computer and Information Science, vol 1905. Springer, Cham. https://doi.org/10.1007/978-3-031-67008-4_6

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10724
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp