806Accesses
36Citations
1Altmetric
Abstract
Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.
Similar content being viewed by others
Notes
Treebanks are language resources that provide annotations of natural languages at various levels of structure: at the word level and the sentence level.
Undiacritized or unvowelized word refers to a word without short vowels.
MADA is a POS tagger for Arabic languages.
The Algerian dialect is the language used in the daily spoken communication of Algerian.
Transcriber is distributed as free software and is available athttp://trans.sourceforge.net.
References
Abdel-Rahman A. (1991). Code-switching and linguistic accommodation in Arabic, InPerspectives on arabic linguistics III: Papers from the third annual symposium on Arabic linguistics (vol. 80, pp. 231250). John Benjamins Publishing.
Alghamdi, M., Elshafei, M. & and Al-Muhtaseb, H. (2002).Speech units for Arabic text-to-speech, fourth workshop on computer and information sciences, pp. 199–212.
Alghamdi, M., Muzaffar, Z., & Alhakami, H. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach.The Arabian Journal for Science and Engineering,35(2), 35.
Andersen, O., Kuhn, R., Lazaridès, A., Dalsgaard, P., Haas, J., & Nth, E. (1996).Comparison of two tree-structured approaches for Grapheme-to-Phoneme conversion, spoken language processing (Vol. 3, pp. 1700–1703). Philadelphia, USA.
Baccouche, T. (2003). Larabe, dune koin dialectale une langue de culture, Mmoires de la soci linguistique de Paris, TomeXI, (les langues de Communication...), 87–93.
Barnard, E., Davel, M. H., & Van Huyssteen, G. B. (2010).Speech technology for information access: A South African case study. In AAAI spring symposium: artificial intelligence for development.
Besacier, L., Le, V.B., Castelli, E., Sethserey, S. & Protin, L. (2005). Reconnaissance automatique de la parole pour des langues peu dotees: Application au vietnamien et au khmer, TALN’2005.
Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey.Speech Communication,56, 85–100.
Biadsy, F., Habash, N. & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. InAnnual conference of the North American, Boulder, Colorado p. 397405.
Bisani, M., & Ney, H. (2008). Joint-sequence models for Grapheme-to-Phoneme conversion.Speech Communication,50, 434–451.
Blachona, D., Gauthiera, E., Besacier, L., Kouarata, G., Adda-Deckerb, M. & Rialland, A. (2016). Parallel speech collection for under-resourced language studies using the lig-aikuma mobile device app, In5th workshop on spoken language technology for under-resourced languages, SLTU’2016.
Cucu, H., Buzo, A., Besacier, L., & Burileanu, C. (2014). SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian.Speech Communication,56, 195–212.
El-Imam, Y. (2004). Phonetization of Arabic: Rules and algorithms, Computer Speech and Language.
Elmahdy, M., Hasegawa-Johnson, M. & Mustafawi, E. (2014). Development of a TV broadcasts speech recognition system for Qatari Arabic, InThe 9th edition of the language resources and evaluation conference: LREC’2014.
Elshafei, M., Al-Muhtaseb, H. & Alghamdi. M. (2006). Statistical methods for automatic diacritization of Arabic text. InThe Saudi 18th national computer conference (vol. 18, pp. 301–306).
Gauthier, E., Besacier, L., Voisin, S., Melese, M. & Elingui, U. P. (2016).Collecting resources in sub-Saharan African languages for automatic speech recognition: A case study of wolof, LREC’2016.
Gauthiera, E., Besacier, L. & Voisinb, S. (2016). Automatic speech recognition for African languages with vowel length contrast. In5th workshop on spoken language technology for under-resourced languages, SLTU’2016.
Gelas, H., Abate, S. T., Besacier, L. & Pellegrino, F. (2012). Analyse des performances de modles de langage sub-lexicale pour des langues peu-dotees morphologie riche, JEP-TALN-RECITAL 2012, Atelier TALAf 2012: Traitement Automatique des Langues Africaines.
Graja, M., Jaoua, M. & Belguith, L. (2010). Lexical study of a spoken dialogue corpus in Tunisian dialect. InACIT2010: The International Arab conference on information technology, Benghazi-Libya, December 1416.
Graja, M., Jaoua, M., & Belguith, L. (2015). Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect.IEEE/ACM Transactions on Audio, Speech & Language Processing,23, 2311–2321.
Habash, N., Diab, D. & Rambow, O. (2012). Conventional orthography for dialectal Arabic. InProceedings of the eighth international conference on language resources and evaluation, LREC’2012.
Habash, N. (2010).Introduction to Arabic natural language processing, synthesis lectures on human language technologies, Graeme Hirst. San Rafael: Morgan & Claypool Publishers.
Habash, N. (2006). On Arabic and its dialects.Multilingual Magazine,17, 81.
Häkkinen, J., Suontausta, J., Riis, S., & Jensen, K. (2003). Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition.Speech Communication,41, 455–467.
Harrat, S., Meftouh, K., Abbas, M., & Smaïli, K. (2014). Grapheme to Phoneme conversion—an Arabic dialect case, InSpoken language technologies for under-resourced languages, (SLTU’2014).
Illina, I., Fohr, D., & Jouvet, D. (2011).Grapheme-to-phoneme conversion using conditional random fields, Interspeech’ 2011.
Jensen, J., & Riis, S. (2000). Self-organizing letter code-book for text-to-phoneme neural network model.Spoken Language Processing,3(318), 321.
Juan, S., & Besacier, L. (2013). Fast bootstrapping of Grapheme to Phoneme system for under-resourced languages-application to the iban language, WSSANLP-2013.
Kheang, S., Katsurada, K., Iribe, Y., & Nitta, T. (2014). Solving the phoneme conflict in Grapheme-to-Phoneme conversion using a two-stage neural network-based approach.IEICE Transactions on Information and Systems,97, 901–910.
Lawson, S., & Itesh, S. (1997).Accommodation communicative en Tunisie: une tude empirique (pp. 101–114). Plurilinguisme et identits au Maghreb: Publications de lUniversite de Rouen.
Lileikyta, R., Gorinaa, A., Lamela, L., Gauvaina, J., & Fraga-Silva, T. H. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. In5th Workshop on spoken language technology for under-resourced languages, SLTU’2016.
Loots, L., & Niesler, T. (2011). Automatic conversion between pronunciations of different English accents.Speech Communication,53, 7584.
Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy.Computational Linguistics,26, 19–219.
Masmoudi, A., Khmekhem, M., Estève, Y., Belguith, L., & Habash, N. (2014). A corpus and phonetic dictionary for Tunisian Arabic speech recognition. InProceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland (pp. 306–310).
Masmoudi, A., Habash, N., Khmekhem, M., Estève, Y., & Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In16th international conference on computational linguistics and intelligent text processing, CICLing 2015. Cairo: Egypt, pp. 608–619.
Mejri, S., Said, S., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie.Synergies Tunisie,1, 53–74.
Nimaan, A., Nocera, P., & Torres-Moreno, J. M. (2006). Boites a outils tal pour les langues peu informatisees: Le cas du somali. JADT06: actes des 8es Journees internationales danalyse statistique des donnees textuelles: Besancon.
Pagel, V., Lenzo, K., & Black, A. (1998). Letter-to-sound rules for accented lexicon compression.Spoken Language Processing, Sydney, Australia,2015, 2018.
Pellegrini, T. (2008). Transcription automatique de langues peu dotees, Ph.D. thesis; Universite Paris Sud-Paris XI.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, K., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. InIEEE 2011 Workshop on Automatic Speech Recognition and Understanding.
Rasipuram, R., & Doss, M. (2012). Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM. InAcoustics, Speech and Signal Processing (ICASSP’2012), pp. 4841–4844.
Saadane, H., & Habash, N. (2015). A conventional orthography for Algerian Arabic. InProceedings of the Second Workshop on Arabic Natural Language Processing, pp. 69–79.
Samson, S., Besacier, L., Lecouteux, B., & Dyab, M. (2015). Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for iban, interspeech’2015. Germany: Dresden.
Schlippe, T., Djomgang, E., Vu, N., Ochs, S., & Schultz, T. (2012). Hause large vocabulary continuous speech recognition. InThe Third International Workshop on Spoken Languages Technologies for Under-Resourced Languages, Cape Town, South Africa, SLTU’2012.
Sejnowski, T., & Rosenberg, C. H. (1987).Parallel networks that learn to pronounce English text. Complex Systems Publications (pp. 145–168).
Seng. K., Iribe, Y., Nitta, T. (2011). Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts. InINTERSPEECH’2011, 12th Annual Conference of the International Speech Communication Association, ISCA, pp. 1885–1888.
Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. InINTERSPEECH’ 2005—Eurospeech, 9th European Conference on Speech Communication and Technology, ISCA, pp. 1973–1976.
Tebbi, H. (2007).Transcription orthographique phonétique en vue de la synthèse de la parole partir du texte de lArabe. Algérie: Univrersité de Blida.
Vergyri, D., Mandal, A., Wang, W., Stolcke, A., Zheng, J., Graciarena, M., et al. (2008). Development of the SRI/Nightingale Arabic ASR system.Interspeech,2008, 14371440.
Vu, N.T, Kraus, F., & Schultz, T. (2011). Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training, Interspeech, Citeseer.
Wang, X., & Sim, K. (2013). Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion, INTERSPEECH’2013.
Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A conventional orthography for Tunisian Arabic. InProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 2355–2361). Reykjavik, Iceland.
Author information
Authors and Affiliations
LIUM, Le Mans University, Le Mans, France
Abir Masmoudi, Fethi Bougares & Yannick Estève
ANLP Research group, MIRACL Lab., University of Sfax, Sfax, Tunisia
Abir Masmoudi, Mariem Ellouze & Lamia Belguith
- Abir Masmoudi
You can also search for this author inPubMed Google Scholar
- Fethi Bougares
You can also search for this author inPubMed Google Scholar
- Mariem Ellouze
You can also search for this author inPubMed Google Scholar
- Yannick Estève
You can also search for this author inPubMed Google Scholar
- Lamia Belguith
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toAbir Masmoudi.
Rights and permissions
About this article
Cite this article
Masmoudi, A., Bougares, F., Ellouze, M.et al. Automatic speech recognition system for Tunisian dialect.Lang Resources & Evaluation52, 249–267 (2018). https://doi.org/10.1007/s10579-017-9402-y
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative