Part of the book series:Lecture Notes in Computer Science ((LNTCS,volume 12861))
Included in the following conference series:
1359Accesses
Abstract
In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.
Supported by Bahir Dar Institute of Technology.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Slocum, J.: A survey of machine translation: its history, current status and future prospects. Comput. Linguist.11(1), 1–17 (1985)
Antony, P.J.: Machine translation approaches and survey for Indian languages. Int. J. Comput. Linguist. Chin. Lang. Process.18(1), 47–78 (2013)
Hutchins, J.: Latest developments in machine translation technology: beginning a new era in MT research. In: Proceedings MT Summit IV.: International Cooperation for Global Communication, pp. 11–34 (1993)
Ashraf, N., Manzoor, A.: Machine translation techniques and their comparative study. Int. J. Comput. Appl.125(7), 25–31 (2015)
Lambert, P., Rafael, E., Núria, C.: Exploiting lexical information and discriminative alignment training in statistical machine translation. Diss. Ph. D. thesis, Universitat Politecnica de Catalunya. Spain (2008)
Poibeau, T.: Machine Translation. MIT Press, Cambridge (2017)
Antony, P.J., Soman, K.P.: Computational morphology and natural language parsing for Indian languages: a literature survey. Int. J. Sci. Eng. Res.3, 589–599 (2012)
Abate, S.T., et al.: Parallel corpora for bi-directional statistical machine translation for seven Ethiopian language Pairs. In: Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing (2018)
Romdhane, A.B., Jamoussi, S., Hamadou, A.B., Smaïli, K.: Phrase-based language model in statistical machine translation. Int. J. Comput. Linguist. Appl.3 (2016)
https://www.grandviewresearch.com/press-release/global-machine-translation-market. Accessed 03 June 2021
Gebreegziabher, M., Besacier, L.: English-Amharic Statistical Machine Translation (2012)
Teshome, E.: Bidirectional English-Amharic machine translation: an experiment using constrained corpus. Master’s thesis. Addis Ababa University (2013)
Teferra, A., Grover, H.: Essentials of Amharic. Rüdiger Köppe. Verlag, Köln (2007)
Daba, J.: Bi-directional English-Afaan oromo machine translation using hybrid approach. Master’s thesis. Addis Ababa University (2013)
Saba, A., Sisay F.: Machine translation for Amharic: where we are. In: proceedings of LREC, pp. 47–50 (2006)
Rauf, S., Holger, S.: Parallel sentence generation from comparable corpora for improved SMT. Mach. Transl.25(4), 341–375 (2011)
Abiodun, S., Asemahagn, A.: Language policy, ideologies, power and the Ethiopian media. Communicatio41(1), 71–89 (2015).https://doi.org/10.1080/02500167.2015.1018288
Leslau, W.: Reference Grammar of Amharic. Otto Harrassowitz, Wiesbaden (1995)
Yimam, B.: Root reductions and extensions in Amharic. Ethiop. J. Lang. Lit.9, 56–88 (1999)
Gasser, M.: A dependency grammar for Amharic. In: Workshop on Language Resource and Human Language Technologies for Semitic Languages (2010)
Gasser, M.: HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya. In: Conference on Human Language Technology for Development, Alexandria, Egypt (2011)
Gezmu, A.M., Nürnberger, A., Bati, T.B.: Extended parallel corpus for Amharic-English machine translation. arXiv e-prints, arXiv-2104 (2021)
Strassel, S., Jennifer, T.: LORELEI language packs: data, tools, and resources for technology development in low resource languages. In: Tenth International Conference on Language Resources and Evaluation, pp. 3273–3280 (2016)
John, S.: Corpus Concordance Collection. OUP, Oxford (1991)
Crystal, D.: An Encyclopedic Dictionary of Language and Languages. Blackwell, Oxford (1992)
Dogru, G., Martín-Mor A., Aguilar-Amat, A.: Parallel corpora preparation for machine translation of low-resource languages: Turkish to English Cardiology Corpora (2018)
HTTrack Website Copier Homepage.https://www.httrack.com/page/2/. Accessed 10 Oct 2020
Heritrix Home Page.http://crawler.archive.org/index.html. Accessed 15 Sep 2020
Palmer, D.D.: Tokenisation and sentence segmentation. Handbook of Natural Language Processing, pp. 11–35 (2000)
Lita, L.V., Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasIng. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 152–159 (2003)
Achraf, O., Mohamed, J.: Designing high accuracy statistical machine translation for sign language using parallel corpus-case study English and American sign language. J. Inf. Technol. Res.12(2), 134–158 (2019)
Goyal, V., Gurpreet, S.: Advances in machine translation systems. Lang. India9(11), 138–150 (2009)
Daniel, J., James, H.M.: Speech and Language Processing. Handbook of Natural Language Processing. Draft of October 2 (2019)
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL, System Demonstrations. Association for Computational Linguistics, Vancouver (2017)
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag.13(3), 55–75 (2018)
Google Cloud Home Page.https://cloud.google.com/translate/automl/docs/evaluate. Accessed 03 Jan 2021
Ambaye, T., Yared, M.: English to Amharic machine translation. The Prague Bulletin of Mathematical Linguistics (2012)
Yeabsira, A., Rosa, T., Surafel, L.: Context based machine translation with recurrent neural network For English-Amharic translation. In: Proceedings of ICLR 2020 (2020)
Author information
Authors and Affiliations
Bahir Dar Institute of Technology, Bahir Dar, Ethiopia
Yohanens Biadgligne
Loria - University of Lorraine, Nancy, France
Kamel Smaïli
- Yohanens Biadgligne
You can also search for this author inPubMed Google Scholar
- Kamel Smaïli
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toKamel Smaïli.
Editor information
Editors and Affiliations
University of Granada, Granada, Spain
Ignacio Rojas
University of Málaga, Málaga, Spain
Gonzalo Joya
Technical University of Catalonia, Barcelona, Spain
Andreu Català
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Biadgligne, Y., Smaïli, K. (2021). Parallel Corpora Preparation for English-Amharic Machine Translation. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_37
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-85029-6
Online ISBN:978-3-030-85030-2
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative