Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Syntactic Dependency-Based N-grams as Classification Features

  • Conference paper

Abstract

In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from ¥17,985 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.

References

  1. Khalilov, M., Fonollosa, J.A.R.: N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 424–432 (2009)

    Google Scholar 

  2. Habash, N.: The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds.) INLG 2004. LNCS (LNAI), vol. 3123, pp. 61–69. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Agarwal, A., Biads, F., Mckeown, K.R.: Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-grams. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 24–32 (2009)

    Google Scholar 

  4. García-Hernández, R.A., Martínez Trinidad, J.F., Carrasco-Ochoa, J.A.: Finding Maximal Sequential Patterns in Text Document Collections and Single Documents. Informatica 34(1), 93–101 (2010)

    MATH  Google Scholar 

  5. Baayen, H., Tweedie, F., Halteren, H.: Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, 121–131 (1996)

    Google Scholar 

  6. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3), 538–556 (2009)

    Article  Google Scholar 

  7. Holmes, D.I.: The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing 13(3), 111–117 (1998)

    Article  Google Scholar 

  8. Juola, P.: Authorship Attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)

    Article  Google Scholar 

  9. Juola, P.: Ad-hoc authorship attribution competition. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 175–176 (2004)

    Google Scholar 

  10. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1) (2002)

    Google Scholar 

  11. Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems 20(5), 67–75 (2005)

    Article  Google Scholar 

  12. van Halteren, H.: Author verification by linguistic profiling: An exploration of the parameter space. ACM Transactions on Speech and Language Processing 4(1), 1–17 (2007)

    Article  Google Scholar 

  13. Grieve, J.: Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing 22(3), 251–270 (2007)

    Article  Google Scholar 

  14. Luyckx, K.: Scalability Issues in Authorship Attribution. Ph.D. Thesis, University of Antwerp (2010)

    Google Scholar 

  15. Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)

    Google Scholar 

  16. Diederich, J., Kindermann, J., et al.: Authorship attribution with support vector machines. Applied Intelligence 19(1), 109–123 (2003)

    Article MATH  Google Scholar 

  17. Escalante, H., Solorio, T., et al.: Local histograms of character n-grams for authorship attribution. In: 49th Annual Meeting of the Association for Computational Linguistics, pp. 288–298 (2011)

    Google Scholar 

  18. Keselj, V., Peng, F., et al.: N-gram-based author profiles for authorship attribution. Computational Linguistics 3, 225–264 (2003)

    Google Scholar 

  19. Koppel, M., Schler, J., et al.: Authorship attribution in the wild. Language Resources and Evaluation 45(1), 83–94 (2011)

    Article  Google Scholar 

  20. Koppel, M., Schler, J., et al.: Measuring differentiability: unmasking pseudonymous authors. Journal of Machine Learning Research, 1261–1276 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Mexico City, Mexico

    Grigori Sidorov, Francisco Velasquez & Alexander Gelbukh

  2. University of the Aegean, Greece

    Efstathios Stamatatos

  3. ESIME, Instituto Politécnico Nacional (IPN), Mexico City, Mexico

    Liliana Chanona-Hernández

Authors
  1. Grigori Sidorov
  2. Francisco Velasquez
  3. Efstathios Stamatatos
  4. Alexander Gelbukh
  5. Liliana Chanona-Hernández

Editor information

Editors and Affiliations

  1. Mexican Petroleum Institute, Eje Central Lazaro Cardenas Norte, 152, Col. San Bartolo Atepehuacan, CP 07730, México D.F., Mexico

    Ildar Batyrshin

  2. Tecnológico de Monterrey, Campus Estado de México, Carretera Lago de Guadalupe Km 3.5, CP 52926, Atizapán de Zaragoza, Estado de México, Mexico

    Miguel González Mendoza

Rights and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L. (2013). Syntactic Dependency-Based N-grams as Classification Features. In: Batyrshin, I., Mendoza, M.G. (eds) Advances in Computational Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37798-3_1

Download citation

Keywords

Publish with us

Profiles

  1. Efstathios StamatatosView author profile
  2. Alexander GelbukhView author profile

Access this chapter

Subscribe and save

Springer+
from ¥17,985 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2026 Movatter.jp