Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 10193))
Included in the following conference series:
2669Accesses
Abstract
Homoglyphs can be used for disguising plagiarized text by replacing letters in source texts with visually identical letters from other scripts. Most current plagiarism detection systems are not able to detect plagiarism when text has been obfuscated using homoglyphs. In this work, we present two alternative approaches for detecting plagiarism in homoglyph obfuscated texts. The first approach utilizes the Unicode list of confusables to replace homoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyph obfuscated words with source words. Empirical testing on datasets from PAN-2015 shows that both approaches perform equally well for plagiarism detection in homoglyph obfuscated texts.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Unicode List of Visually Confusable Characters.http://www.unicode.org/Public/security/9.0.0/confusables.txt. Accessed 19 Oct 2016
Alvi, F., Stevenson, M., Clough, P.D.: Hashing and merging heuristics for text reuse detection. In: Working Notes for CLEF 2014 Conference, pp. 939–946 (2014)
Costello, A.: RFC3492-Punycode: a bootstring encoding of Unicode for internationalized domain names in applications (IDNA). Network Working Group (2003).http://www.ietf.org/rfc/rfc3492.txt. Accessed 19 Oct 2016
Fu, A.Y., Deng, X., Wenyin, L.: REGAP: a tool for Unicode-based web identity fraud detection. J. Digital Forensic Pract.1(2), 83–97 (2006)
Gillam, L., Marinuzzi, J., Ioannou, P.: Turnitoff-defeating plagiarism detection systems. In: Proceedings of the 11th Higher Education Academy-ICS Annual Conference. Higher Education Academy (2010)
Heather, J.: Turnitoff: identifying and fixing a hole in current plagiarism detection software. Assess. Eval. High. Educ.35(6), 647–660 (2010)
Kakkonen, T., Mozgovoy, M.: Hermetic and web plagiarism detection systems for student essays an evaluation of the state-of-the-art. J. Educ. Comput. Res.42(2), 135–159 (2010)
Meuschke, N., Gipp, B.: State-of-the-art in detecting academic plagiarism. Int. J. Educ. Integrity9(1), 50–71 (2013)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR)33(1), 31–88 (2001)
Palkovskii, Y., Belov, A.: Submission to the 7th International Competition on Plagiarism Detection (2015).http://www.uni-weimar.de/medien/webis/events/pan-15. Accessed 15 Oct 2016
Potthast, M., Göring, S., Rosso, P., Stein, B.: Towards data submissions for shared tasks: first experiences for the task of text alignment. In: Working Notes Papers of the CLEF 2015 Evaluation Labs, CEUR Workshop Proceedings, September 2015
Weber-Wulff, D., Möer, C., Touras, J., Zincke, E.: Plagiarism Detection Software Test 2013 (2013).http://plagiat.htw-berlin.de/software-en/test2013/report-2013/. Accessed 15 Oct 2016
Wenyin, L., Fu, A.Y., Deng, X.: Exposing homograph obfuscation intentions by coloring unicode strings. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 275–286. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78849-2_29
Author information
Authors and Affiliations
University of Sheffield, Sheffield, S10 2TN, UK
Faisal Alvi, Mark Stevenson & Paul Clough
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Faisal Alvi
- Faisal Alvi
You can also search for this author inPubMed Google Scholar
- Mark Stevenson
You can also search for this author inPubMed Google Scholar
- Paul Clough
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toFaisal Alvi.
Editor information
Editors and Affiliations
University of Glasgow , Glasgow, United Kingdom
Joemon M Jose
TU Delft - EWI/ST/WIS , Delft, The Netherlands
Claudia Hauff
Middle East Technical University , Ankara, Turkey
Ismail Sengor Altıngovde
Open University , Milton Keynes, United Kingdom
Dawei Song
Signal Media , London, United Kingdom
Dyaa Albakour
Toronto, Canada
Stuart Watt
JohnTait.net Ltd. and BCS IRSG , Sunderland, United Kingdom
John Tait
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alvi, F., Stevenson, M., Clough, P. (2017). Plagiarism Detection in Texts Obfuscated with Homoglyphs. In: Jose, J.,et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_64
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-56607-8
Online ISBN:978-3-319-56608-5
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative