- Grigori Sidorov21,
- Francisco Velasquez21,
- Efstathios Stamatatos22,
- Alexander Gelbukh21 &
- …
- Liliana Chanona-Hernández23
Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 7630))
Included in the following conference series:
2071Accesses
80Citations
Abstract
In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, books and news in related subjects, suggested using machine learning.References
Khalilov, M., Fonollosa, J.A.R.: N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 424–432 (2009)
Habash, N.: The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds.) INLG 2004. LNCS (LNAI), vol. 3123, pp. 61–69. Springer, Heidelberg (2004)
Agarwal, A., Biads, F., Mckeown, K.R.: Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-grams. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 24–32 (2009)
García-Hernández, R.A., Martínez Trinidad, J.F., Carrasco-Ochoa, J.A.: Finding Maximal Sequential Patterns in Text Document Collections and Single Documents. Informatica 34(1), 93–101 (2010)
Baayen, H., Tweedie, F., Halteren, H.: Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, 121–131 (1996)
Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3), 538–556 (2009)
Holmes, D.I.: The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing 13(3), 111–117 (1998)
Juola, P.: Authorship Attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)
Juola, P.: Ad-hoc authorship attribution competition. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 175–176 (2004)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1) (2002)
Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems 20(5), 67–75 (2005)
van Halteren, H.: Author verification by linguistic profiling: An exploration of the parameter space. ACM Transactions on Speech and Language Processing 4(1), 1–17 (2007)
Grieve, J.: Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing 22(3), 251–270 (2007)
Luyckx, K.: Scalability Issues in Authorship Attribution. Ph.D. Thesis, University of Antwerp (2010)
Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)
Diederich, J., Kindermann, J., et al.: Authorship attribution with support vector machines. Applied Intelligence 19(1), 109–123 (2003)
Escalante, H., Solorio, T., et al.: Local histograms of character n-grams for authorship attribution. In: 49th Annual Meeting of the Association for Computational Linguistics, pp. 288–298 (2011)
Keselj, V., Peng, F., et al.: N-gram-based author profiles for authorship attribution. Computational Linguistics 3, 225–264 (2003)
Koppel, M., Schler, J., et al.: Authorship attribution in the wild. Language Resources and Evaluation 45(1), 83–94 (2011)
Koppel, M., Schler, J., et al.: Measuring differentiability: unmasking pseudonymous authors. Journal of Machine Learning Research, 1261–1276 (2007)
Author information
Authors and Affiliations
Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Mexico City, Mexico
Grigori Sidorov, Francisco Velasquez & Alexander Gelbukh
University of the Aegean, Greece
Efstathios Stamatatos
ESIME, Instituto Politécnico Nacional (IPN), Mexico City, Mexico
Liliana Chanona-Hernández
- Grigori Sidorov
Search author on:PubMed Google Scholar
- Francisco Velasquez
Search author on:PubMed Google Scholar
- Efstathios Stamatatos
Search author on:PubMed Google Scholar
- Alexander Gelbukh
Search author on:PubMed Google Scholar
- Liliana Chanona-Hernández
Search author on:PubMed Google Scholar
Editor information
Editors and Affiliations
Mexican Petroleum Institute, Eje Central Lazaro Cardenas Norte, 152, Col. San Bartolo Atepehuacan, CP 07730, México D.F., Mexico
Ildar Batyrshin
Tecnológico de Monterrey, Campus Estado de México, Carretera Lago de Guadalupe Km 3.5, CP 52926, Atizapán de Zaragoza, Estado de México, Mexico
Miguel González Mendoza
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L. (2013). Syntactic Dependency-Based N-grams as Classification Features. In: Batyrshin, I., Mendoza, M.G. (eds) Advances in Computational Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37798-3_1
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-37797-6
Online ISBN:978-3-642-37798-3
eBook Packages:Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Keywords
Publish with us
Profiles
- Efstathios StamatatosView author profile
- Alexander GelbukhView author profile
