Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Parallel Corpora Preparation for English-Amharic Machine Translation

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNTCS,volume 12861))

Included in the following conference series:

Abstract

In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) experiments. The performance using the bilingual evaluation understudy metric (BLEU) shows 26.47 and 32.44 respectively for SMT and NMT. The corpus was collected from the Internet using automatic and semi automatic techniques. The harvested corpus concerns domains coming from Religion, Law, and News. Finally, the corpus, we built is composed of 225,304 parallel sentences, it will be shared for free with the community. In our knowledge, this is the biggest parallel corpus so far concerning the Amharic language.

Supported by Bahir Dar Institute of Technology.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Slocum, J.: A survey of machine translation: its history, current status and future prospects. Comput. Linguist.11(1), 1–17 (1985)

    Google Scholar 

  2. Antony, P.J.: Machine translation approaches and survey for Indian languages. Int. J. Comput. Linguist. Chin. Lang. Process.18(1), 47–78 (2013)

    Google Scholar 

  3. Hutchins, J.: Latest developments in machine translation technology: beginning a new era in MT research. In: Proceedings MT Summit IV.: International Cooperation for Global Communication, pp. 11–34 (1993)

    Google Scholar 

  4. Ashraf, N., Manzoor, A.: Machine translation techniques and their comparative study. Int. J. Comput. Appl.125(7), 25–31 (2015)

    Google Scholar 

  5. Lambert, P., Rafael, E., Núria, C.: Exploiting lexical information and discriminative alignment training in statistical machine translation. Diss. Ph. D. thesis, Universitat Politecnica de Catalunya. Spain (2008)

    Google Scholar 

  6. Poibeau, T.: Machine Translation. MIT Press, Cambridge (2017)

    Book  Google Scholar 

  7. Antony, P.J., Soman, K.P.: Computational morphology and natural language parsing for Indian languages: a literature survey. Int. J. Sci. Eng. Res.3, 589–599 (2012)

    Google Scholar 

  8. Abate, S.T., et al.: Parallel corpora for bi-directional statistical machine translation for seven Ethiopian language Pairs. In: Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing (2018)

    Google Scholar 

  9. Romdhane, A.B., Jamoussi, S., Hamadou, A.B., Smaïli, K.: Phrase-based language model in statistical machine translation. Int. J. Comput. Linguist. Appl.3 (2016)

    Google Scholar 

  10. https://www.grandviewresearch.com/press-release/global-machine-translation-market. Accessed 03 June 2021

  11. Gebreegziabher, M., Besacier, L.: English-Amharic Statistical Machine Translation (2012)

    Google Scholar 

  12. Teshome, E.: Bidirectional English-Amharic machine translation: an experiment using constrained corpus. Master’s thesis. Addis Ababa University (2013)

    Google Scholar 

  13. Teferra, A., Grover, H.: Essentials of Amharic. Rüdiger Köppe. Verlag, Köln (2007)

    Google Scholar 

  14. Daba, J.: Bi-directional English-Afaan oromo machine translation using hybrid approach. Master’s thesis. Addis Ababa University (2013)

    Google Scholar 

  15. Saba, A., Sisay F.: Machine translation for Amharic: where we are. In: proceedings of LREC, pp. 47–50 (2006)

    Google Scholar 

  16. Rauf, S., Holger, S.: Parallel sentence generation from comparable corpora for improved SMT. Mach. Transl.25(4), 341–375 (2011)

    Article  Google Scholar 

  17. Abiodun, S., Asemahagn, A.: Language policy, ideologies, power and the Ethiopian media. Communicatio41(1), 71–89 (2015).https://doi.org/10.1080/02500167.2015.1018288

    Article  Google Scholar 

  18. Leslau, W.: Reference Grammar of Amharic. Otto Harrassowitz, Wiesbaden (1995)

    Google Scholar 

  19. Yimam, B.: Root reductions and extensions in Amharic. Ethiop. J. Lang. Lit.9, 56–88 (1999)

    Google Scholar 

  20. Gasser, M.: A dependency grammar for Amharic. In: Workshop on Language Resource and Human Language Technologies for Semitic Languages (2010)

    Google Scholar 

  21. Gasser, M.: HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya. In: Conference on Human Language Technology for Development, Alexandria, Egypt (2011)

    Google Scholar 

  22. Gezmu, A.M., Nürnberger, A., Bati, T.B.: Extended parallel corpus for Amharic-English machine translation. arXiv e-prints, arXiv-2104 (2021)

    Google Scholar 

  23. Strassel, S., Jennifer, T.: LORELEI language packs: data, tools, and resources for technology development in low resource languages. In: Tenth International Conference on Language Resources and Evaluation, pp. 3273–3280 (2016)

    Google Scholar 

  24. John, S.: Corpus Concordance Collection. OUP, Oxford (1991)

    Google Scholar 

  25. Crystal, D.: An Encyclopedic Dictionary of Language and Languages. Blackwell, Oxford (1992)

    Google Scholar 

  26. Dogru, G., Martín-Mor A., Aguilar-Amat, A.: Parallel corpora preparation for machine translation of low-resource languages: Turkish to English Cardiology Corpora (2018)

    Google Scholar 

  27. HTTrack Website Copier Homepage.https://www.httrack.com/page/2/. Accessed 10 Oct 2020

  28. Heritrix Home Page.http://crawler.archive.org/index.html. Accessed 15 Sep 2020

  29. Palmer, D.D.: Tokenisation and sentence segmentation. Handbook of Natural Language Processing, pp. 11–35 (2000)

    Google Scholar 

  30. Lita, L.V., Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasIng. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 152–159 (2003)

    Google Scholar 

  31. Achraf, O., Mohamed, J.: Designing high accuracy statistical machine translation for sign language using parallel corpus-case study English and American sign language. J. Inf. Technol. Res.12(2), 134–158 (2019)

    Article  Google Scholar 

  32. Goyal, V., Gurpreet, S.: Advances in machine translation systems. Lang. India9(11), 138–150 (2009)

    Google Scholar 

  33. Daniel, J., James, H.M.: Speech and Language Processing. Handbook of Natural Language Processing. Draft of October 2 (2019)

    Google Scholar 

  34. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL, System Demonstrations. Association for Computational Linguistics, Vancouver (2017)

    Google Scholar 

  35. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag.13(3), 55–75 (2018)

    Article  Google Scholar 

  36. Google Cloud Home Page.https://cloud.google.com/translate/automl/docs/evaluate. Accessed 03 Jan 2021

  37. Ambaye, T., Yared, M.: English to Amharic machine translation. The Prague Bulletin of Mathematical Linguistics (2012)

    Google Scholar 

  38. Yeabsira, A., Rosa, T., Surafel, L.: Context based machine translation with recurrent neural network For English-Amharic translation. In: Proceedings of ICLR 2020 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Bahir Dar Institute of Technology, Bahir Dar, Ethiopia

    Yohanens Biadgligne

  2. Loria - University of Lorraine, Nancy, France

    Kamel Smaïli

Authors
  1. Yohanens Biadgligne

    You can also search for this author inPubMed Google Scholar

  2. Kamel Smaïli

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toKamel Smaïli.

Editor information

Editors and Affiliations

  1. University of Granada, Granada, Spain

    Ignacio Rojas

  2. University of Málaga, Málaga, Spain

    Gonzalo Joya

  3. Technical University of Catalonia, Barcelona, Spain

    Andreu Català

Rights and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Biadgligne, Y., Smaïli, K. (2021). Parallel Corpora Preparation for English-Amharic Machine Translation. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_37

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp