Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

International Conference on Natural Language Processing

2091Accesses

Abstract

A measure of similarity is required to find and compare cross-lingual articles concerning a specific topic. This measure can be based on bilingual dictionaries or based on numerical methods such as Latent Semantic Indexing (LSI). In this paper, we use LSI in two ways to retrieve Arabic-English comparable articles. The first way is monolingual: the English article is translated into Arabic and then mapped into the Arabic LSI space; the second way is cross-lingual: Arabic and English documents are mapped into Arabic-English LSI space. Then we compare LSI approaches to the dictionary-based approach on several English-Arabic parallel and comparable corpora. Results indicate that the performance of our cross-lingual LSI approach is competitive to the monolingual approach and even better for some corpora. Moreover, both LSI approaches outperform the dictionary approach.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Document Similarity for Arabic and Cross-Lingual Web Content

Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences

A Language Framework for Measuring Semantic and Syntactic Similarity for Arabic Texts

Article27 March 2024

References

Aljlayl, M., Frieder, O., Grossman, D.: On Arabic-English Cross-Language Information Retrieval: Machine Translation Approach. In: Machine Readable Dictionaries and Machine Translation, ACM Tenth Conference on Information and Knowledge Managemen (CIKM), pp. 295–302. ACM Press (2002)
Google Scholar
Berry, M.W., Young, P.G.: Using latent semantic indexing for multilanguage information retrieval. Computers and the Humanities 29(6), 413–429 (1995)
Article Google Scholar
Bond, F., Paik, K.: A survey of wordnets and their licenses. In: 6th Global WordNet Conference (GWC 2012), pp. 64–71 (2012)
Google Scholar
Cettolo, M., Girardi, C., Federico, M.: Wit³: Web inventory of transcribed and translated talks. In: Proceedings of the 16^th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp. 261–268 (May 2012)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Dumais, S.: Lsa and information retrieval: Getting back to basics. In: Handbook of Latent Semantic Analysis, pp. 293–321 (2007)
Google Scholar
Fujii, A., Ishikawa, T.: Applying machine translation to two-stage cross-language information retrieval. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 13–24. Springer, Heidelberg (2000),http://dx.doi.org/10.1007/3-540-39965-8_2
Chapter Google Scholar
Habash, N.: Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1), 1–187 (2010)
Article Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2-3), 259–284 (1998)
Article Google Scholar
Li, B., Gaussier, E.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 644–652. Association for Computational Linguistics (2010)
Google Scholar
Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross-Language Information Retrieval. The Springer International Series on Information Retrieval, pp. 51–62. Springer, US (1998)
Chapter Google Scholar
Ma, X., Zakhary, D.: Arabic newswire english translation collection. Linguistic Data Consortium, Philadelphia (2009)
Google Scholar
Meftouh, K., Laskri, M.T., Smaïli, K.: Modeling Arabic Language using statistical methods. Arabian Journal for Science and Engineering 35(2C), 69–82 (2010)
Google Scholar
Muhic, A., Rupnik, J., Skraba, P.: Cross-lingual document similarity. In: Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces (ITI), pp. 387–392 (June 2012)
Google Scholar
NIST, M.I.G.: NIST 2008/2009 open machine translation (OpenMT) evaluation. Linguistic Data Consortium, Philadelphia (2010)
Google Scholar
Otero, P., López, I., Cilenis, S., de Compostela, S.: Measuring comparability of multilingual corpora extracted from wikipedia. In: Iberian Cross-Language Natural Language Processings Tasks (ICL), p. 8 (2011)
Google Scholar
Rafalovitch, A., Dale, R.: United nations general assembly resolutions: A six-language parallel corpus. In: Proceedings of the MT Summit XII, vol. 13, pp. 292–299 (2009)
Google Scholar
Saad, M.: The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification. Master’s thesis, Computer Engineering Dept., Islamic University of Gaza, Palestine (2010)
Google Scholar
Saad, M., Langlois, D., Smaïli, K.: Extracting comparable articles from wikipedia and measuring their comparabilities. Procedia - Social and Behavioral Sciences 95, 40–47 (2013),http://www.sciencedirect.com/science/article/pii/S1877042813041402, corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC 2013)
Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Chair), N.C.C., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)
Google Scholar
Ture, F.: Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation. Ph.D. thesis, Graduate School of the University of Maryland, College Park (2013),http://hdl.handle.net/1903/14502

Download references

Author information

Authors and Affiliations

SMarT Group, LORIA INRIA, Villers-lès-Nancy, F-54600, France
Motaz Saad, David Langlois & Kamel Smaïli
Université de Lorraine, LORIA, UMR 7503, Villers-lès-Nancy, F-54600, France
Motaz Saad, David Langlois & Kamel Smaïli
CNRS, LORIA, UMR 7503, Villers-lès-Nancy, F-54600, France
Motaz Saad, David Langlois & Kamel Smaïli

Authors

Motaz Saad
View author publications
You can also search for this author inPubMed Google Scholar
David Langlois
View author publications
You can also search for this author inPubMed Google Scholar
Kamel Smaïli
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Adam Przepiórkowski & Maciej Ogrodniczuk &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saad, M., Langlois, D., Smaïli, K. (2014). Cross-Lingual Semantic Similarity Measure for Comparable Articles. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_11

Download citation

DOI:https://doi.org/10.1007/978-3-319-10888-9_11
Publisher Name:Springer, Cham
Print ISBN:978-3-319-10887-2
Online ISBN:978-3-319-10888-9
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Cross-Lingual Semantic Similarity Measure for Comparable Articles

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Document Similarity for Arabic and Cross-Lingual Web Content

Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences

A Language Framework for Measuring Semantic and Syntactic Similarity for Arabic Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now