{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T03:24:17Z","timestamp":1771125857547,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,7,12]],"date-time":"2019-07-12T00:00:00Z","timestamp":1562889600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2019,9,30]]},"abstract":"<jats:p>Modern Standard Arabic, as well as Arabic dialect languages, are usually written without diacritics. The absence of these marks constitute a real problem in the automatic processing of these data by NLP tools. Indeed, writing Arabic without diacritics introduces several types of ambiguity. First, a word without diacratics could have many possible meanings depending on their diacritization. Second, undiacritized surface forms of an Arabic word might have as many as 200 readings depending on the complexity of its morphology [12]. In fact, the agglutination property of Arabic might produce a problem that can only be resolved using diacritics. Third, without diacritics a word could have many possible parts of speech (POS) instead of one. This is the case with the words that have the same spelling and POS tag but a different lexical sense, or words that have the same spelling but different POS tags and lexical senses [8]. Finally, there is ambiguity at the grammatical level (syntactic ambiguity). In this article, we propose the first work that investigates the automatic diacritization of Tunisian Dialect texts. We first describe our annotation guidelines and procedure. Then, we propose two major models, namely a statistical machine translation (SMT) and a discriminative model as a sequence classification task based on Conditional Random Fields (CRF). In the second approach, we integrate POS features to influence the generation of diacritics. Diacritics restoration was performed at both the word and the character levels. The results showed high scores of automatic diacritization based on the CRF system (Word Error Rate (WER) 21.44% for CRF and WER 34.6% for SMT).<\/jats:p>","DOI":"10.1145\/3297278","type":"journal-article","created":{"date-parts":[[2019,7,12]],"date-time":"2019-07-12T13:12:41Z","timestamp":1562937161000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Automatic Diacritics Restoration for Tunisian Dialect"],"prefix":"10.1145","volume":"18","author":[{"given":"Abir","family":"Masmoudi","sequence":"first","affiliation":[{"name":"MIRACL Laboratory-University of Sfax, Tunisia"}]},{"given":"Salima","family":"Mdhaffar","sequence":"additional","affiliation":[{"name":"MIRACL Laboratory-University of Sfax, Tunisia"}]},{"given":"Rahma","family":"Sellami","sequence":"additional","affiliation":[{"name":"MIRACL Laboratory-University of Sfax, Tunisia"}]},{"given":"Lamia Hadrich","family":"Belguith","sequence":"additional","affiliation":[{"name":"MIRACL Laboratory-University of Sfax, Tunisia"}]}],"member":"320","published-online":{"date-parts":[[2019,7,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/2780081.2780156"},{"key":"e_1_2_1_2_1","unstructured":"A. Ahmed and M. Elaraby. 2000. A large-scale computational processor of the Arabic morphology and applications. PhD thesis Faculty of Engineering Cairo University Giza Egypt.  A. Ahmed and M. Elaraby. 2000. A large-scale computational processor of the Arabic morphology and applications. PhD thesis Faculty of Engineering Cairo University Giza Egypt."},{"key":"e_1_2_1_3_1","first-page":"320","article-title":"A rule-based approach for tagging non-vocalized Arabic words","volume":"6","author":"Al-Taani A.","year":"2009","journal-title":"Int. Arab J. Info. Technol."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Digital Signal Processing and Signal Processing Education Meeting (DSP\/SPE\u201913)","author":"Alotaibi Y. A."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 3rd Arabic Natural Language Processing Workshop (WANLP\u201917)","author":"Al-Badrashiny M."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Federation for Information Processing (IFIP\u201915)","author":"Ameur M."},{"key":"e_1_2_1_7_1","first-page":"2","article-title":"Automatic diacritics restoration for Arabic text","volume":"12","author":"Ayman A. Z.","year":"2016","journal-title":"International J. Comput. Info. Sci."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324913000284"},{"key":"e_1_2_1_9_1","unstructured":"T. Baccouche. 2003. L\u2019arabe d\u2019une koin dialectale une langue de culture Memoires de la societe linguistique de Paris TomeXI (les langues de Communication) 87--93.  T. Baccouche. 2003. L\u2019arabe d\u2019une koin dialectale une langue de culture Memoires de la societe linguistique de Paris TomeXI (les langues de Communication) 87--93."},{"key":"e_1_2_1_10_1","unstructured":"T. Baccouche. 1994. L\u2019emprunt en arabe moderne Beit Elhikma et IBLV.  T. Baccouche. 1994. L\u2019emprunt en arabe moderne Beit Elhikma et IBLV."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing.","author":"Belinkov Y."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 2nd Workshop on Arabic Natural Language Processing.","author":"Bouamor H."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the International Conference on Applications of Natural Language to Information Systems, (NLDB\u201914)","author":"Boujelbane R."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/972470.972474"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of MTSummit.","author":"Diab M."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of Saudi 18th National Computer Conference (NCC\u201906)","author":"Elshafei M."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 3rd Arabic Natural Language Processing Workshop (WANLP\u201917)","author":"Fashwan A."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118637.1118641"},{"key":"e_1_2_1_19_1","unstructured":"M. L. Gibson. 1998. Dialect Contact in Tunisian Arabic: Sociolinguistic and Structural Aspects. PhD thesis. Department of linguistic science University of Reading.  M. L. Gibson. 1998. Dialect Contact in Tunisian Arabic: Sociolinguistic and Structural Aspects. PhD thesis. Department of linguistic science University of Reading."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2015.2464687"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Habash N."},{"key":"e_1_2_1_22_1","volume-title":"Conventional orthography for dialectal Arabic. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912)","author":"Habash N."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.","author":"Habash N."},{"key":"e_1_2_1_24_1","first-page":"1","article-title":"A survey and comparative study of Arabic diacritization tools","volume":"32","author":"Hamed O.","year":"2017","journal-title":"J. Lang. Technol. Computat. Linguist."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 14th Annual Conference of the International Speech Communication.","author":"Harrat S."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1037\/xhp0000032"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 12th Conference on Language Engineering (ESOLEC\u201912)","author":"Hifny Y.","year":"2012"},{"key":"e_1_2_1_28_1","volume-title":"Modern Arabic: Structures, Functions, and Varieties","author":"Holes C.","year":"2004"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP\u201914)","author":"Jarrar M."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2005.01.004"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the Association for Computational Linguistics (ACL\u201907)","author":"Koehn P."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM\u201914)","author":"Kubra A."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201901)","author":"Lafferty J."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the British Computer Society Arabic NLP\/MT Conference.","author":"Maamouri M."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC\u201908)","author":"Maamouri M."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-017-9402-y"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages.","author":"Masmoudi A."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 19th Language Resources and Evaluation Conference.","author":"Masmoudi A."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 16th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201915)","author":"Masmoudi A."},{"key":"e_1_2_1_40_1","first-page":"53","article-title":"Pluringuisme et diglossie en Tunisie","volume":"1","author":"Mejri S.","year":"2009","journal-title":"Synerg. Tunisie"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103321337421"},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"M. Rashwan A. Al Sallab H. Raafat and A. Rafea. 2015. Deep learning framework with confused sub set resolution architecture for automatic Arabic diacritization. IEEE\/ACM Trans. Audio Speech Lang. Process. 23 3 (2015).   M. Rashwan A. Al Sallab H. Raafat and A. Rafea. 2015. Deep learning framework with confused sub set resolution architecture for automatic Arabic diacritization. IEEE\/ACM Trans. Audio Speech Lang. Process. 23 3 (2015).","DOI":"10.1109\/TASLP.2015.2395255"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 2nd Workshop on Arabic Natural Language Processing.","author":"Saadane H."},{"key":"e_1_2_1_44_1","unstructured":"I. Sfar. 2005. Morphologie des noms de professions : Incorporation et paraphrase La terminologie entre traduction et bilinguisme (2005) 15--16.  I. Sfar. 2005. Morphologie des noms de professions : Incorporation et paraphrase La terminologie entre traduction et bilinguisme (2005) 15--16."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the Workshop on Computational Approaches to Semitic Languages (EACL\u201909)","author":"Shaalan K."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the Language Engineering Conference.","author":"Shaalan K."},{"key":"e_1_2_1_47_1","unstructured":"T. Schlippe. 2008. Statistical methods for automatic diacritization of Arabic texts. Carnegie Mellon University Pittsburgh PA.  T. Schlippe. 2008. Statistical methods for automatic diacritization of Arabic texts. Carnegie Mellon University Pittsburgh PA."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (ICSLP\u201902)","author":"Stolcke A.","year":"2002"},{"key":"e_1_2_1_49_1","unstructured":"F. Talmoudi. 1980. A morphosyntactic study of Romance verbs in the Arabic dialects of Tunis Sousa and Sfax. G\u00f6teborg Acta Universitatis Gothoburgensis.  F. Talmoudi. 1980. A morphosyntactic study of Romance verbs in the Arabic dialects of Tunis Sousa and Sfax. G\u00f6teborg Acta Universitatis Gothoburgensis."},{"key":"e_1_2_1_50_1","unstructured":"M. Tilmatine. 1999. Substrat Et Convergences: Le Berb\u00e9re Et L\u2019arabe Nord-Africain. In Estudios de Dialectologia Norteafricana Y Andalusi M. Haak R. Jong K. De Versteegh (Eds.).  M. Tilmatine. 1999. Substrat Et Convergences: Le Berb\u00e9re Et L\u2019arabe Nord-Africain. In Estudios de Dialectologia Norteafricana Y Andalusi M. Haak R. Jong K. De Versteegh (Eds.)."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2010.2098440"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Zaghouani W."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Zaghouani W."},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the Association for Computational Linguistics Fourth Linguistic Annotation Workshop.","author":"Zaghouani W."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220248"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2008.06.001"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Zribi I."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2017.01.004"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.13053\/rcs-90-1-9"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-37247-6_13"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3297278","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3297278","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:14Z","timestamp":1750208534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3297278"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,12]]},"references-count":60,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,9,30]]}},"alternative-id":["10.1145\/3297278"],"URL":"https:\/\/doi.org\/10.1145\/3297278","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,12]]},"assertion":[{"value":"2018-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}