Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Automatic speech recognition system for Tunisian dialect

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Treebanks are language resources that provide annotations of natural languages at various levels of structure: at the word level and the sentence level.

  2. Undiacritized or unvowelized word refers to a word without short vowels.

  3. MADA is a POS tagger for Arabic languages.

  4. The Algerian dialect is the language used in the daily spoken communication of Algerian.

  5. Transcriber is distributed as free software and is available athttp://trans.sourceforge.net.

References

  • Abdel-Rahman A. (1991). Code-switching and linguistic accommodation in Arabic, InPerspectives on arabic linguistics III: Papers from the third annual symposium on Arabic linguistics (vol. 80, pp. 231250). John Benjamins Publishing.

  • Alghamdi, M., Elshafei, M. & and Al-Muhtaseb, H. (2002).Speech units for Arabic text-to-speech, fourth workshop on computer and information sciences, pp. 199–212.

  • Alghamdi, M., Muzaffar, Z., & Alhakami, H. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach.The Arabian Journal for Science and Engineering,35(2), 35.

    Google Scholar 

  • Andersen, O., Kuhn, R., Lazaridès, A., Dalsgaard, P., Haas, J., & Nth, E. (1996).Comparison of two tree-structured approaches for Grapheme-to-Phoneme conversion, spoken language processing (Vol. 3, pp. 1700–1703). Philadelphia, USA.

  • Baccouche, T. (2003). Larabe, dune koin dialectale une langue de culture, Mmoires de la soci linguistique de Paris, TomeXI, (les langues de Communication...), 87–93.

  • Barnard, E., Davel, M. H., & Van Huyssteen, G. B. (2010).Speech technology for information access: A South African case study. In AAAI spring symposium: artificial intelligence for development.

  • Besacier, L., Le, V.B., Castelli, E., Sethserey, S. & Protin, L. (2005). Reconnaissance automatique de la parole pour des langues peu dotees: Application au vietnamien et au khmer, TALN’2005.

  • Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey.Speech Communication,56, 85–100.

    Article  Google Scholar 

  • Biadsy, F., Habash, N. & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. InAnnual conference of the North American, Boulder, Colorado p. 397405.

  • Bisani, M., & Ney, H. (2008). Joint-sequence models for Grapheme-to-Phoneme conversion.Speech Communication,50, 434–451.

    Article  Google Scholar 

  • Blachona, D., Gauthiera, E., Besacier, L., Kouarata, G., Adda-Deckerb, M. & Rialland, A. (2016). Parallel speech collection for under-resourced language studies using the lig-aikuma mobile device app, In5th workshop on spoken language technology for under-resourced languages, SLTU’2016.

  • Cucu, H., Buzo, A., Besacier, L., & Burileanu, C. (2014). SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian.Speech Communication,56, 195–212.

    Article  Google Scholar 

  • El-Imam, Y. (2004). Phonetization of Arabic: Rules and algorithms, Computer Speech and Language.

  • Elmahdy, M., Hasegawa-Johnson, M. & Mustafawi, E. (2014). Development of a TV broadcasts speech recognition system for Qatari Arabic, InThe 9th edition of the language resources and evaluation conference: LREC’2014.

  • Elshafei, M., Al-Muhtaseb, H. & Alghamdi. M. (2006). Statistical methods for automatic diacritization of Arabic text. InThe Saudi 18th national computer conference (vol. 18, pp. 301–306).

  • Gauthier, E., Besacier, L., Voisin, S., Melese, M. & Elingui, U. P. (2016).Collecting resources in sub-Saharan African languages for automatic speech recognition: A case study of wolof, LREC’2016.

  • Gauthiera, E., Besacier, L. & Voisinb, S. (2016). Automatic speech recognition for African languages with vowel length contrast. In5th workshop on spoken language technology for under-resourced languages, SLTU’2016.

  • Gelas, H., Abate, S. T., Besacier, L. & Pellegrino, F. (2012). Analyse des performances de modles de langage sub-lexicale pour des langues peu-dotees morphologie riche, JEP-TALN-RECITAL 2012, Atelier TALAf 2012: Traitement Automatique des Langues Africaines.

  • Graja, M., Jaoua, M. & Belguith, L. (2010). Lexical study of a spoken dialogue corpus in Tunisian dialect. InACIT2010: The International Arab conference on information technology, Benghazi-Libya, December 1416.

  • Graja, M., Jaoua, M., & Belguith, L. (2015). Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect.IEEE/ACM Transactions on Audio, Speech & Language Processing,23, 2311–2321.

    Article  Google Scholar 

  • Habash, N., Diab, D. & Rambow, O. (2012). Conventional orthography for dialectal Arabic. InProceedings of the eighth international conference on language resources and evaluation, LREC’2012.

  • Habash, N. (2010).Introduction to Arabic natural language processing, synthesis lectures on human language technologies, Graeme Hirst. San Rafael: Morgan & Claypool Publishers.

    Google Scholar 

  • Habash, N. (2006). On Arabic and its dialects.Multilingual Magazine,17, 81.

    Google Scholar 

  • Häkkinen, J., Suontausta, J., Riis, S., & Jensen, K. (2003). Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition.Speech Communication,41, 455–467.

  • Harrat, S., Meftouh, K., Abbas, M., & Smaïli, K. (2014). Grapheme to Phoneme conversion—an Arabic dialect case, InSpoken language technologies for under-resourced languages, (SLTU’2014).

  • Illina, I., Fohr, D., & Jouvet, D. (2011).Grapheme-to-phoneme conversion using conditional random fields, Interspeech’ 2011.

  • Jensen, J., & Riis, S. (2000). Self-organizing letter code-book for text-to-phoneme neural network model.Spoken Language Processing,3(318), 321.

    Google Scholar 

  • Juan, S., & Besacier, L. (2013). Fast bootstrapping of Grapheme to Phoneme system for under-resourced languages-application to the iban language, WSSANLP-2013.

  • Kheang, S., Katsurada, K., Iribe, Y., & Nitta, T. (2014). Solving the phoneme conflict in Grapheme-to-Phoneme conversion using a two-stage neural network-based approach.IEICE Transactions on Information and Systems,97, 901–910.

    Article  Google Scholar 

  • Lawson, S., & Itesh, S. (1997).Accommodation communicative en Tunisie: une tude empirique (pp. 101–114). Plurilinguisme et identits au Maghreb: Publications de lUniversite de Rouen.

    Google Scholar 

  • Lileikyta, R., Gorinaa, A., Lamela, L., Gauvaina, J., & Fraga-Silva, T. H. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. In5th Workshop on spoken language technology for under-resourced languages, SLTU’2016.

  • Loots, L., & Niesler, T. (2011). Automatic conversion between pronunciations of different English accents.Speech Communication,53, 7584.

    Article  Google Scholar 

  • Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy.Computational Linguistics,26, 19–219.

    Article  Google Scholar 

  • Masmoudi, A., Khmekhem, M., Estève, Y., Belguith, L., & Habash, N. (2014). A corpus and phonetic dictionary for Tunisian Arabic speech recognition. InProceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland (pp. 306–310).

  • Masmoudi, A., Habash, N., Khmekhem, M., Estève, Y., & Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In16th international conference on computational linguistics and intelligent text processing, CICLing 2015. Cairo: Egypt, pp. 608–619.

  • Mejri, S., Said, S., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie.Synergies Tunisie,1, 53–74.

    Google Scholar 

  • Nimaan, A., Nocera, P., & Torres-Moreno, J. M. (2006). Boites a outils tal pour les langues peu informatisees: Le cas du somali. JADT06: actes des 8es Journees internationales danalyse statistique des donnees textuelles: Besancon.

  • Pagel, V., Lenzo, K., & Black, A. (1998). Letter-to-sound rules for accented lexicon compression.Spoken Language Processing, Sydney, Australia,2015, 2018.

    Google Scholar 

  • Pellegrini, T. (2008). Transcription automatique de langues peu dotees, Ph.D. thesis; Universite Paris Sud-Paris XI.

  • Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, K., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. InIEEE 2011 Workshop on Automatic Speech Recognition and Understanding.

  • Rasipuram, R., & Doss, M. (2012). Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM. InAcoustics, Speech and Signal Processing (ICASSP’2012), pp. 4841–4844.

  • Saadane, H., & Habash, N. (2015). A conventional orthography for Algerian Arabic. InProceedings of the Second Workshop on Arabic Natural Language Processing, pp. 69–79.

  • Samson, S., Besacier, L., Lecouteux, B., & Dyab, M. (2015). Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for iban, interspeech’2015. Germany: Dresden.

  • Schlippe, T., Djomgang, E., Vu, N., Ochs, S., & Schultz, T. (2012). Hause large vocabulary continuous speech recognition. InThe Third International Workshop on Spoken Languages Technologies for Under-Resourced Languages, Cape Town, South Africa, SLTU’2012.

  • Sejnowski, T., & Rosenberg, C. H. (1987).Parallel networks that learn to pronounce English text. Complex Systems Publications (pp. 145–168).

  • Seng. K., Iribe, Y., Nitta, T. (2011). Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts. InINTERSPEECH’2011, 12th Annual Conference of the International Speech Communication Association, ISCA, pp. 1885–1888.

  • Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. InINTERSPEECH’ 2005Eurospeech, 9th European Conference on Speech Communication and Technology, ISCA, pp. 1973–1976.

  • Tebbi, H. (2007).Transcription orthographique phonétique en vue de la synthèse de la parole partir du texte de lArabe. Algérie: Univrersité de Blida.

    Google Scholar 

  • Vergyri, D., Mandal, A., Wang, W., Stolcke, A., Zheng, J., Graciarena, M., et al. (2008). Development of the SRI/Nightingale Arabic ASR system.Interspeech,2008, 14371440.

    Google Scholar 

  • Vu, N.T, Kraus, F., & Schultz, T. (2011). Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training, Interspeech, Citeseer.

  • Wang, X., & Sim, K. (2013). Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion, INTERSPEECH’2013.

  • Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A conventional orthography for Tunisian Arabic. InProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 2355–2361). Reykjavik, Iceland.

Download references

Author information

Authors and Affiliations

  1. LIUM, Le Mans University, Le Mans, France

    Abir Masmoudi, Fethi Bougares & Yannick Estève

  2. ANLP Research group, MIRACL Lab., University of Sfax, Sfax, Tunisia

    Abir Masmoudi, Mariem Ellouze & Lamia Belguith

Authors
  1. Abir Masmoudi

    You can also search for this author inPubMed Google Scholar

  2. Fethi Bougares

    You can also search for this author inPubMed Google Scholar

  3. Mariem Ellouze

    You can also search for this author inPubMed Google Scholar

  4. Yannick Estève

    You can also search for this author inPubMed Google Scholar

  5. Lamia Belguith

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toAbir Masmoudi.

Rights and permissions

About this article

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp