Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Plagiarism Detection in Texts Obfuscated with Homoglyphs

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

Abstract

Homoglyphs can be used for disguising plagiarized text by replacing letters in source texts with visually identical letters from other scripts. Most current plagiarism detection systems are not able to detect plagiarism when text has been obfuscated using homoglyphs. In this work, we present two alternative approaches for detecting plagiarism in homoglyph obfuscated texts. The first approach utilizes the Unicode list of confusables to replace homoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyph obfuscated words with source words. Empirical testing on datasets from PAN-2015 shows that both approaches perform equally well for plagiarism detection in homoglyph obfuscated texts.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Unicode List of Visually Confusable Characters.http://www.unicode.org/Public/security/9.0.0/confusables.txt. Accessed 19 Oct 2016

  2. Alvi, F., Stevenson, M., Clough, P.D.: Hashing and merging heuristics for text reuse detection. In: Working Notes for CLEF 2014 Conference, pp. 939–946 (2014)

    Google Scholar 

  3. Costello, A.: RFC3492-Punycode: a bootstring encoding of Unicode for internationalized domain names in applications (IDNA). Network Working Group (2003).http://www.ietf.org/rfc/rfc3492.txt. Accessed 19 Oct 2016

  4. Fu, A.Y., Deng, X., Wenyin, L.: REGAP: a tool for Unicode-based web identity fraud detection. J. Digital Forensic Pract.1(2), 83–97 (2006)

    Article  Google Scholar 

  5. Gillam, L., Marinuzzi, J., Ioannou, P.: Turnitoff-defeating plagiarism detection systems. In: Proceedings of the 11th Higher Education Academy-ICS Annual Conference. Higher Education Academy (2010)

    Google Scholar 

  6. Heather, J.: Turnitoff: identifying and fixing a hole in current plagiarism detection software. Assess. Eval. High. Educ.35(6), 647–660 (2010)

    Article  Google Scholar 

  7. Kakkonen, T., Mozgovoy, M.: Hermetic and web plagiarism detection systems for student essays an evaluation of the state-of-the-art. J. Educ. Comput. Res.42(2), 135–159 (2010)

    Article  Google Scholar 

  8. Meuschke, N., Gipp, B.: State-of-the-art in detecting academic plagiarism. Int. J. Educ. Integrity9(1), 50–71 (2013)

    Google Scholar 

  9. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR)33(1), 31–88 (2001)

    Article  Google Scholar 

  10. Palkovskii, Y., Belov, A.: Submission to the 7th International Competition on Plagiarism Detection (2015).http://www.uni-weimar.de/medien/webis/events/pan-15. Accessed 15 Oct 2016

  11. Potthast, M., Göring, S., Rosso, P., Stein, B.: Towards data submissions for shared tasks: first experiences for the task of text alignment. In: Working Notes Papers of the CLEF 2015 Evaluation Labs, CEUR Workshop Proceedings, September 2015

    Google Scholar 

  12. Weber-Wulff, D., Möer, C., Touras, J., Zincke, E.: Plagiarism Detection Software Test 2013 (2013).http://plagiat.htw-berlin.de/software-en/test2013/report-2013/. Accessed 15 Oct 2016

  13. Wenyin, L., Fu, A.Y., Deng, X.: Exposing homograph obfuscation intentions by coloring unicode strings. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 275–286. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78849-2_29

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. University of Sheffield, Sheffield, S10 2TN, UK

    Faisal Alvi, Mark Stevenson & Paul Clough

  2. King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia

    Faisal Alvi

Authors
  1. Faisal Alvi

    You can also search for this author inPubMed Google Scholar

  2. Mark Stevenson

    You can also search for this author inPubMed Google Scholar

  3. Paul Clough

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toFaisal Alvi.

Editor information

Editors and Affiliations

  1. University of Glasgow , Glasgow, United Kingdom

    Joemon M Jose

  2. TU Delft - EWI/ST/WIS , Delft, The Netherlands

    Claudia Hauff

  3. Middle East Technical University , Ankara, Turkey

    Ismail Sengor Altıngovde

  4. Open University , Milton Keynes, United Kingdom

    Dawei Song

  5. Signal Media , London, United Kingdom

    Dyaa Albakour

  6. Toronto, Canada

    Stuart Watt

  7. JohnTait.net Ltd. and BCS IRSG , Sunderland, United Kingdom

    John Tait

Rights and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alvi, F., Stevenson, M., Clough, P. (2017). Plagiarism Detection in Texts Obfuscated with Homoglyphs. In: Jose, J.,et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_64

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp