{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T17:01:36Z","timestamp":1758042096779,"version":"3.44.0"},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"9","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>In the past, Arabic Dialects (AD) have been poorly documented linguistically due to the lack of written forms and orthographies. However, in recent years, AD have become more widely used as a means of communication since social media and the everywhere availability of the internet have created a massive overflow of information and textual data, leading to a growing interest in Natural Language Processing (NLP) for these dialects. The highly inflectional morphology and the lack of standard orthography for AD pose an important challenge for NLP work. In this article, we handle the problem of lacking standard orthography during our work to build a morphological analyzer for Egyptian Arabic (EGY). To identify the guidelines for detecting conventional orthography, we depend on a corpus of 597,000 words that were gathered from various sources and genres. While analyzing the corpus morphologically, we handle the conventional orthography problem by assigning each word the conventional EGY Lemma and stem as close as possible to the EGY pronunciation no matter how it is typically written. Nevertheless, there are some common phenomena and complex cases involved in detecting conventional orthography during the morphological annotation process. Therefore, we take a closer look at and discuss these common phenomena and complex cases as we detect conventional orthography. These conventional orthographies are represented in a manner that facilitates the parsing of them correctly by the morphological analyzer. We tested the coverage of our morphological analyzer and compared it to one of the state-of-the-art morphological analyzers.<\/jats:p>","DOI":"10.1145\/3748324","type":"journal-article","created":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T11:43:31Z","timestamp":1752493411000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Computational Linguistic Approach to Orthographic Representation of Egyptian Arabic: Challenges and Implications"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8091-8405","authenticated-orcid":false,"given":"Amany","family":"Fashwan","sequence":"first","affiliation":[{"name":"Linguistics and Phonetics Department, Alexandria University, Faculty of Arts","place":["Alexandria, Egypt"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2950-9555","authenticated-orcid":false,"given":"Sameh","family":"Alansary","sequence":"additional","affiliation":[{"name":"Linguistics and Phonetics Department, Alexandria University, Faculty of Arts","place":["Alexandria, Egypt"]}]}],"member":"320","published-online":{"date-parts":[[2025,9,11]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","first-page":"207","DOI":"10.3115\/v1\/W14-3628","volume-title":"Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)","author":"Al-Mannai Kamla","year":"2014","unstructured":"Kamla Al-Mannai, Hassan Sajjad, Alaa Khader, Fahad Al Obaidli, Preslav Nakov, and Stephan Vogel. 2014. Unsupervised word segmentation improves dialectal Arabic to English machine translation. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). 207\u2013216."},{"key":"e_1_3_2_3_2","volume-title":"Levels of Contemporary Arabic in Egypt","author":"Badawi Al-Sayyid Mu\u1e25ammad","year":"1973","unstructured":"Al-Sayyid Mu\u1e25ammad Badawi. 1973. Levels of Contemporary Arabic in Egypt. Dar Al-Maarif."},{"key":"e_1_3_2_4_2","volume-title":"Proceedings of the 6th International Conference on Informatics and Systems, Infos2008. Cairo University","author":"Bakr Hitham Abo","year":"2008","unstructured":"Hitham Abo Bakr, Khaled Shaalan, and Ibrahim Ziedan. 2008. A hybrid approach for converting written Egyptian colloquial dialect into diacritized Arabic. In Proceedings of the 6th International Conference on Informatics and Systems, Infos2008. Cairo University. Citeseer."},{"key":"e_1_3_2_5_2","unstructured":"Yonatan Belinkov Nizar Habash Adam Kilgarriff Noam Ordan Ryan Roth and V\u0131t Suchomel. 2013. arTenTen: A new vast corpus for Arabic. In Proceedings of WACL. 20."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"93","DOI":"10.3115\/v1\/W14-3612","volume-title":"Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)","author":"Bies Ann","year":"2014","unstructured":"Ann Bies, Zhiyi Song, Mohamed Maamouri, Stephen Grimes, Haejoong Lee, Jonathan Wright, Stephanie Strassel, Nizar Habash, Ramy Eskander, and Owen Rambow. 2014. Transliteration of Arabizi into Arabic orthography: Developing a parallel annotated Arabizi-Arabic script SMS\/Chat corpus. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). 93\u2013103."},{"key":"e_1_3_2_7_2","first-page":"1240","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation (LREC)","author":"Bouamor Houda","year":"2014","unstructured":"Houda Bouamor, Nizar Habash, and Kemal Oflazer. 2014. A multidialectal parallel corpus of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 1240\u20131245."},{"key":"e_1_3_2_8_2","first-page":"86","article-title":"Buckwalter Arabic Morphological Analyzer Version 1.0","author":"Buckwalter Tim","year":"2002","unstructured":"Tim Buckwalter. 2002. Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, University of Pennsylvania. 86\u201393.","journal-title":"Linguistic Data Consortium, University of Pennsylvania"},{"key":"e_1_3_2_9_2","first-page":"66","volume-title":"Proceedings of the Lrec Workshop on Semitic Language Processing","author":"Diab Mona","year":"2010","unstructured":"Mona Diab, Nizar Habash, Owen Rambow, Mohamed Altantawy, and Yassine Benajiba. 2010. COLABA: Arabic dialect annotation and processing. In Proceedings of the Lrec Workshop on Semitic Language Processing. Citeseer, 66\u201374."},{"key":"e_1_3_2_10_2","first-page":"3455","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers","author":"Eskander Ramy","year":"2016","unstructured":"Ramy Eskander, Nizar Habash, Owen Rambow, and Arfath Pasha. 2016. Creating resources for dialectal Arabic from a single annotation: A case study on Egyptian and Levantine. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 3455\u20133465."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2021.05.084"},{"key":"e_1_3_2_12_2","first-page":"142","volume-title":"Proceedings of the 7th Arabic Natural Language Processing Workshop (WANLP)","author":"Fashwan Amany","year":"2022","unstructured":"Amany Fashwan and Sameh Alansary. 2022. Developing a tag-set and extracting the morphological lexicons to build a morphological analyzer for Egyptian Arabic. In Proceedings of the 7th Arabic Natural Language Processing Workshop (WANLP). 142\u2013160."},{"key":"e_1_3_2_13_2","volume-title":"Comparative Morphology of Standard and Egyptian Arabic","author":"Gadalla Hassan A. H.","year":"2000","unstructured":"Hassan A. H. Gadalla. 2000. Comparative Morphology of Standard and Egyptian Arabic. Vol. 5. Lincom Europa Munich."},{"key":"e_1_3_2_14_2","first-page":"711","volume-title":"Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912)","author":"Habash Nizar","year":"2012","unstructured":"Nizar Habash, Mona Diab, and Owen Rambow. 2012. Conventional orthography for dialectal Arabic. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912). 711\u2013718."},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018)","author":"Habash Nizar","year":"2018","unstructured":"Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alexander Erdmann, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, et\u00a0al. 2018. Unified guidelines and resources for Arabic dialect orthography. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018)."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/2390930.2390931"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.08.003"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-016-9370-7"},{"key":"e_1_3_2_19_2","unstructured":"S. Khalifa N. Habash D. Abdulrahim and S. Hassan. 2016. A large scale corpus of Gulf Arabic. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916). 4282\u20134289."},{"key":"e_1_3_2_20_2","volume-title":"Dialectal Arabic Processing Using Deep Learning","author":"Samih Younes","year":"2017","unstructured":"Younes Samih. 2017. Dialectal Arabic Processing Using Deep Learning. Ph. D. Dissertation. Dissertation, D\u00fcsseldorf, Heinrich-Heine-Universit\u00e4t."},{"key":"e_1_3_2_21_2","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation (LREC)","author":"Turki Houcemeddine","year":"2016","unstructured":"Houcemeddine Turki, Emad Adel, Tariq Daouda, and Nassim Regragui. 2016. A conventional orthography for Maghrebi Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC)."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748324","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T13:36:43Z","timestamp":1757597803000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748324"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,11]]},"references-count":20,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3748324"],"URL":"https:\/\/doi.org\/10.1145\/3748324","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2025,9,11]]},"assertion":[{"value":"2024-09-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}