Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Cross-Lingual Semantic Similarity Measure for Comparable Articles

  • Conference paper

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

  • 2091Accesses

Abstract

A measure of similarity is required to find and compare cross-lingual articles concerning a specific topic. This measure can be based on bilingual dictionaries or based on numerical methods such as Latent Semantic Indexing (LSI). In this paper, we use LSI in two ways to retrieve Arabic-English comparable articles. The first way is monolingual: the English article is translated into Arabic and then mapped into the Arabic LSI space; the second way is cross-lingual: Arabic and English documents are mapped into Arabic-English LSI space. Then we compare LSI approaches to the dictionary-based approach on several English-Arabic parallel and comparable corpora. Results indicate that the performance of our cross-lingual LSI approach is competitive to the monolingual approach and even better for some corpora. Moreover, both LSI approaches outperform the dictionary approach.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aljlayl, M., Frieder, O., Grossman, D.: On Arabic-English Cross-Language Information Retrieval: Machine Translation Approach. In: Machine Readable Dictionaries and Machine Translation, ACM Tenth Conference on Information and Knowledge Managemen (CIKM), pp. 295–302. ACM Press (2002)

    Google Scholar 

  2. Berry, M.W., Young, P.G.: Using latent semantic indexing for multilanguage information retrieval. Computers and the Humanities 29(6), 413–429 (1995)

    Article  Google Scholar 

  3. Bond, F., Paik, K.: A survey of wordnets and their licenses. In: 6th Global WordNet Conference (GWC 2012), pp. 64–71 (2012)

    Google Scholar 

  4. Cettolo, M., Girardi, C., Federico, M.: Wit3: Web inventory of transcribed and translated talks. In: Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp. 261–268 (May 2012)

    Google Scholar 

  5. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  6. Dumais, S.: Lsa and information retrieval: Getting back to basics. In: Handbook of Latent Semantic Analysis, pp. 293–321 (2007)

    Google Scholar 

  7. Fujii, A., Ishikawa, T.: Applying machine translation to two-stage cross-language information retrieval. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 13–24. Springer, Heidelberg (2000),http://dx.doi.org/10.1007/3-540-39965-8_2

    Chapter  Google Scholar 

  8. Habash, N.: Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1), 1–187 (2010)

    Article  Google Scholar 

  9. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2-3), 259–284 (1998)

    Article  Google Scholar 

  10. Li, B., Gaussier, E.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 644–652. Association for Computational Linguistics (2010)

    Google Scholar 

  11. Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross-Language Information Retrieval. The Springer International Series on Information Retrieval, pp. 51–62. Springer, US (1998)

    Chapter  Google Scholar 

  12. Ma, X., Zakhary, D.: Arabic newswire english translation collection. Linguistic Data Consortium, Philadelphia (2009)

    Google Scholar 

  13. Meftouh, K., Laskri, M.T., Smaïli, K.: Modeling Arabic Language using statistical methods. Arabian Journal for Science and Engineering 35(2C), 69–82 (2010)

    Google Scholar 

  14. Muhic, A., Rupnik, J., Skraba, P.: Cross-lingual document similarity. In: Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces (ITI), pp. 387–392 (June 2012)

    Google Scholar 

  15. NIST, M.I.G.: NIST 2008/2009 open machine translation (OpenMT) evaluation. Linguistic Data Consortium, Philadelphia (2010)

    Google Scholar 

  16. Otero, P., López, I., Cilenis, S., de Compostela, S.: Measuring comparability of multilingual corpora extracted from wikipedia. In: Iberian Cross-Language Natural Language Processings Tasks (ICL), p. 8 (2011)

    Google Scholar 

  17. Rafalovitch, A., Dale, R.: United nations general assembly resolutions: A six-language parallel corpus. In: Proceedings of the MT Summit XII, vol. 13, pp. 292–299 (2009)

    Google Scholar 

  18. Saad, M.: The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification. Master’s thesis, Computer Engineering Dept., Islamic University of Gaza, Palestine (2010)

    Google Scholar 

  19. Saad, M., Langlois, D., Smaïli, K.: Extracting comparable articles from wikipedia and measuring their comparabilities. Procedia - Social and Behavioral Sciences 95, 40–47 (2013),http://www.sciencedirect.com/science/article/pii/S1877042813041402, corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC 2013)

  20. Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Chair), N.C.C., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)

    Google Scholar 

  21. Ture, F.: Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation. Ph.D. thesis, Graduate School of the University of Maryland, College Park (2013),http://hdl.handle.net/1903/14502

Download references

Author information

Authors and Affiliations

  1. SMarT Group, LORIA INRIA, Villers-lès-Nancy, F-54600, France

    Motaz Saad, David Langlois & Kamel Smaïli

  2. Université de Lorraine, LORIA, UMR 7503, Villers-lès-Nancy, F-54600, France

    Motaz Saad, David Langlois & Kamel Smaïli

  3. CNRS, LORIA, UMR 7503, Villers-lès-Nancy, F-54600, France

    Motaz Saad, David Langlois & Kamel Smaïli

Authors
  1. Motaz Saad

    You can also search for this author inPubMed Google Scholar

  2. David Langlois

    You can also search for this author inPubMed Google Scholar

  3. Kamel Smaïli

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland

    Adam Przepiórkowski  & Maciej Ogrodniczuk  & 

Rights and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Saad, M., Langlois, D., Smaïli, K. (2014). Cross-Lingual Semantic Similarity Measure for Comparable Articles. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_11

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp