Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 10570))
Included in the following conference series:
1479Accesses
Abstract
Many Wikipedia articles that cover the same topic in different language editions are interconnected via cross-language links that enable the understanding of topics in multiple languages, as well as cross-language information retrieval applications. However, cross-language links are added manually by the users of Wikipedia and, as such, are often incorrect. In this paper, we propose an approach to automatically eliminate incorrect cross-language links based on the observation that groups of articles that are pairwise connected through cross-language links form independent connected components. For eachincoherent component (i.e., one that contains two or more articles from the same language edition), our approach assigns acorrectness score to its crosslinks and removes those with the lowest score to make the component coherent. The results of our evaluation on a snapshot of Wikipedia in 8 languages indicates that our approach shows quantitative promise.
This is a preview of subscription content,log in via an institution to check access.
Similar content being viewed by others
References
Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in Wikipedia. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 62–69 (2006)
Adar, E., Skinner, M., Weld, D.S.: Information arbitrage across multi-lingual Wikipedia. In: Proceedings of WSDM, pp. 94–103. ACM (2009)
Bennacer, N., Johnson Vioulès, M., López, M.A., Quercini, G.: A multilingual approach to discover cross-language links in Wikipedia. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T., Zhang, Y. (eds.) WISE 2015. LNCS, vol. 9418, pp. 539–553. Springer, Cham (2015). doi:10.1007/978-3-319-26190-4_36
Bolikowski, Ł.: Scale-free Topology of the Interlanguage Links in Wikipedia. arXiv preprintarXiv:0904.0564 (2009)
de Melo G., Weikum, G.: MENTA: inducing multilingual taxonomies from Wikipedia. In: Procedings of CIKM, pp. 1099–1108. ACM (2010)
de Melo, G., Weikum, G.: Untangling the cross-lingual link structure of Wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 844–853. Association for Computational Linguistics, Stroudsburg (2010)
Moreira, C.E.M., Moreira, V.P.: Finding missing cross-language links in Wikipedia. JIDM4(3), 251–265 (2013)
Penta, A., Quercini, G., Reynaud, C., Shadbolt, N.: Discovering cross-language links in Wikipedia through semantic relatedness. In: Proceedings of ECAI, pp. 642–647 (2012)
Rinser, D., Lange, D., Naumann, F.: Cross-lingual entity matching and infobox alignment in Wikipedia. Inf. Syst.38(6), 887–907 (2013)
Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of Wikipedia-a classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, pp. 49–54 (2008)
Sorg, P., Cimiano, P.: Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng.74, 26–45 (2012)
Author information
Authors and Affiliations
LRI, CentraleSupélec, Paris-Saclay University, 91190, Gif-sur-Yvette, France
Nacéra Bennacer, Francesca Bugiotti, Jorge Galicia, Mariana Patricio & Gianluca Quercini
- Nacéra Bennacer
You can also search for this author inPubMed Google Scholar
- Francesca Bugiotti
You can also search for this author inPubMed Google Scholar
- Jorge Galicia
You can also search for this author inPubMed Google Scholar
- Mariana Patricio
You can also search for this author inPubMed Google Scholar
- Gianluca Quercini
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toGianluca Quercini.
Editor information
Editors and Affiliations
University of Sydney, Darlington, NSW, Australia
Athman Bouguettaya
Zhejiang University, Hangzhou, China
Yunjun Gao
Institute of Computing for Physics and Technology, Protvino, Russia
Andrey Klimenko
Nanyang Technological University, Singapore, Singapore
Lu Chen
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Xiangliang Zhang
Institute of Computing for Physics and Technology, Protvino, Russia
Fedor Dzerzhinskiy
Shanghai Jiao Tong University, Minhang Qu, China
Weijia Jia
Institute of Computing for Physics and Technology, Protvino, Russia
Stanislav V. Klimenko
City University of Hong Kong, Kowloon, Hong Kong
Qing Li
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Bennacer, N., Bugiotti, F., Galicia, J., Patricio, M., Quercini, G. (2017). Eliminating Incorrect Cross-Language Links in Wikipedia. In: Bouguettaya, A.,et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10570. Springer, Cham. https://doi.org/10.1007/978-3-319-68786-5_9
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-68785-8
Online ISBN:978-3-319-68786-5
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative