{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T17:26:45Z","timestamp":1774978005487,"version":"3.50.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2009,12,1]],"date-time":"2009-12-01T00:00:00Z","timestamp":1259625600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:p>The Arabic language presents researchers and developers of natural language processing (NLP) applications for Arabic text and speech with serious challenges. The purpose of this article is to describe some of these challenges and to present some solutions that would guide current and future practitioners in the field of Arabic natural language processing (ANLP). We begin with general features of the Arabic language in Sections 1, 2, and 3 and then we move to more specific properties of the language in the rest of the article. In Section 1 of this article we highlight the significance of the Arabic language today and describe its general properties. Section 2 presents the feature of Arabic Diglossia showing how the sociolinguistic aspects of the Arabic language differ from other languages. The stability of Arabic Diglossia and its implications for ANLP applications are discussed and ways to deal with this problematic property are proposed. Section 3 deals with the properties of the Arabic script and the explosion of ambiguity that results from the absence of short vowel representations and overt case markers in contemporary Arabic texts. We present in Section 4 specific features of the Arabic language such as the nonconcatenative property of Arabic morphology, Arabic as an agglutinative language, Arabic as a pro-drop language, and the challenge these properties pose to ANLP. We also present solutions that have already been adopted by some pioneering researchers in the field. In Section 5 we point out to the lack of formal and explicit grammars of Modern Standard Arabic which impedes the progress of more advanced ANLP systems. In Section 6 we draw our conclusion.<\/jats:p>","DOI":"10.1145\/1644879.1644881","type":"journal-article","created":{"date-parts":[[2010,1,5]],"date-time":"2010-01-05T15:05:08Z","timestamp":1262703908000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":333,"title":["Arabic Natural Language Processing"],"prefix":"10.1145","volume":"8","author":[{"given":"Ali","family":"Farghaly","sequence":"first","affiliation":[{"name":"Monterey Institute of International Studies"}]},{"given":"Khaled","family":"Shaalan","sequence":"additional","affiliation":[{"name":"The British University in Dubai"}]}],"member":"320","published-online":{"date-parts":[[2009,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-009-9054-9"},{"key":"e_1_2_1_2_1","volume-title":"Urdu. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL\u201907)","author":"Almas Y."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10368"},{"key":"e_1_2_1_4_1","unstructured":"Attia M. 1999. A large scale computational processor of Arabic morphology and applications. Master\u2019s Dissertation Computer Engineering Cairo University Egypt. Attia M. 1999. A large scale computational processor of Arabic morphology and applications. Master\u2019s Dissertation Computer Engineering Cairo University Egypt."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1654576.1654588"},{"key":"e_1_2_1_6_1","unstructured":"Attia M. 2008. Handling Arabic morphological and syntactic ambiguities within the LFG framework with a view to machine translation. PhD Dissertation University of Manchester. Attia M. 2008. Handling Arabic morphological and syntactic ambiguities within the LFG framework with a view to machine translation. PhD Dissertation University of Manchester."},{"key":"e_1_2_1_7_1","unstructured":"Badawi E. Carter M. G. and Gully A. 2004. Modern Written Arabic: A Comprehensive Grammar. Routledge London. Badawi E. Carter M. G. and Gully A. 2004. Modern Written Arabic: A Comprehensive Grammar . Routledge London."},{"key":"e_1_2_1_8_1","unstructured":"Bakalla M. H. 2002. Arabic Language Through Its Language and Literature. Kegan Paul London. Bakalla M. H. 2002. Arabic Language Through Its Language and Literature . Kegan Paul London."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3115\/992628.992647"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the Workshop on Arabic Natural Language Processing at the 39th Annual Meeting of the Association for Computational Linguistics (ACL\u201901)","author":"Beesley K.","year":"2001"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201908)","author":"Benjajiba Y."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Bresnan J. 2000. Lexical Functional Syntax. Blackwell Publishers Inc. Malden MA. Bresnan J. 2000. Lexical Functional Syntax . Blackwell Publishers Inc. Malden MA.","DOI":"10.1093\/oso\/9780198238430.003.0011"},{"key":"e_1_2_1_13_1","unstructured":"Buckwalter T. 2002. Arabic transliteration. http:\/\/www.qamus.org\/aramorph\/. Buckwalter T. 2002. Arabic transliteration. http:\/\/www.qamus.org\/aramorph\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1621804.1621813"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Chomsky N. 1965. Aspects of the theory of syntax. MIT Press Cambridge MA. Chomsky N. 1965. Aspects of the theory of syntax . MIT Press Cambridge MA.","DOI":"10.21236\/AD0616323"},{"key":"e_1_2_1_16_1","unstructured":"Chomsky N. 1981. Lectures on Government and Binding. Foris Publications Dordrecht. Chomsky N. 1981. Lectures on Government and Binding . Foris Publications Dordrecht."},{"key":"e_1_2_1_17_1","unstructured":"Chomsky N. 1982. Some concepts and consequences of the theory of government and binding. MIT Press Cambridge MA. Chomsky N. 1982. Some concepts and consequences of the theory of government and binding. MIT Press Cambridge MA."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR\u201909)","author":"Choukri K.","year":"2009"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 6th Applied Natural Language Processing Conference (ANLP\u201900)","author":"Cavalli-Sforza V."},{"key":"e_1_2_1_20_1","unstructured":"CIA. 2008. CIA Word Fact Book. Central Intelligence Agency Washington D.C. CIA. 2008. CIA Word Fact Book . Central Intelligence Agency Washington D.C."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 2nd Workshop on Psycho-Computational Models of Human Language Acquisition, Association for Computational Linguistics (ACL\u201905)","author":"Dell\u2019Orletta F."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL\u201907)","author":"Diab M."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 2nd English Language Symposium on Discourse Analysis (LSDA\u201982)","author":"Farghaly A.","year":"1982"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the Arabic Morphology Workshop (AMW\u201987)","author":"Farghaly A.","year":"1987"},{"key":"e_1_2_1_25_1","unstructured":"Farghaly A. 1999. Arabic diglossia and Arabic identity in the information age. Al-Fikr Al-Arabi March-April. Farghaly A. 1999. Arabic diglossia and Arabic identity in the information age. Al-Fikr Al-Arabi March-April ."},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Farghaly A. 2005. A case for inter-Arabic Grammar. In Eligbali A. Ed. Investigating Arabic: Current Parameters in Analysis and Learning. Brill Boston. Farghaly A. 2005. A case for inter-Arabic Grammar. In Eligbali A. Ed. Investigating Arabic: Current Parameters in Analysis and Learning . Brill Boston.","DOI":"10.1163\/9789047405085_004"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages (CAASl\u201907)","author":"Farghaly A.","year":"2007"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the International Arab Conference on Information Technology (ACIT\u201908)","author":"Farghaly A.","year":"2008"},{"key":"e_1_2_1_29_1","unstructured":"Farghaly A. 2010. Introduction in Arabic computational linguistics. CSLI Publications Stanford CA. Farghaly A. 2010. Introduction in Arabic computational linguistics . CSLI Publications Stanford CA."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the MT Summit IX, the Association for Machine Translation in the Americas (AMTA\u201903)","author":"Farghaly A."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/00437956.1959.11659702"},{"key":"e_1_2_1_32_1","volume-title":"Epilogue: Diglossia revisited. In Contemporary Arabic Linguistics in Honor of El-Said Badawi","author":"Ferguson C.","year":"1996"},{"key":"e_1_2_1_33_1","unstructured":"Fraser A. and Wong W. 2009. The language weaver Arabic to English statistical machine translation system. To appear in Farghaly A. Ed. Arabic Computational Linguistics. CSLI Publications. To appear. Fraser A. and Wong W. 2009. The language weaver Arabic to English statistical machine translation system. To appear in Farghaly A. Ed. Arabic Computational Linguistics . CSLI Publications. To appear."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.3115\/992628.992709"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of Traitement Automatique du Langage Naturel (TALN\u201904)","author":"Habash N.","year":"2004"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the Association for Computational Linguistics (ACL\u201905)","author":"Habash N."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219911"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 2nd Conference on Computer Processing of the Arabic Language (CPAL\u201985)","author":"Hlal Y.","year":"1985"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the Workshop on Human Language Translation and Natural Language Processing within the Arabic World (LREC\u201908)","author":"Hosny A."},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 10th Text Retrieval Conference (TREC\u201901)","author":"Larkey L."},{"key":"e_1_2_1_41_1","unstructured":"Maamouri M. and Bies A. 2010. The Penn Arabic Treebank. In Farghaly A. Ed. Arabic Computational Linguistics. CSLI Publications Stanford CA. Maamouri M. and Bies A. 2010. The Penn Arabic Treebank. In Farghaly A. Ed. Arabic Computational Linguistics . CSLI Publications Stanford CA."},{"key":"e_1_2_1_42_1","first-page":"373","article-title":"A prosodic theory of nonconcatenative morphology","volume":"12","author":"McCarthy J.","year":"1981","journal-title":"Linguistic Inquiry."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380230602"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Ryding K. 2005. Reference grammar of modern standard Arabic. Cambridge University Press Cambridge UK. Ryding K. 2005. Reference grammar of modern standard Arabic . Cambridge University Press Cambridge UK.","DOI":"10.1017\/CBO9780511486975"},{"key":"e_1_2_1_45_1","unstructured":"Sag I. and Pollard C. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press Chicago IL. Sag I. and Pollard C. 1994. Head-Driven Phrase Structure Grammar . University of Chicago Press Chicago IL."},{"key":"e_1_2_1_46_1","unstructured":"Sawaf H. 2009. The AppTek hybrid machine translation system. In Farghaly Ali Ed. Arabic Computational Linguistics. CSLI Publications. To appear. Sawaf H. 2009. The AppTek hybrid machine translation system. In Farghaly Ali Ed. Arabic Computational Linguistics . CSLI Publications. To appear."},{"key":"e_1_2_1_47_1","article-title":"An intelligent computer-assisted language learning system for Arabic learners","volume":"18","author":"Shaalan K.","year":"2005","journal-title":"J. Int. Comput. Assist. Lang. Learn."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/1067114.1067116"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1142\/S021942790400105X"},{"key":"e_1_2_1_50_1","volume-title":"Interlingua: A rule-based approach","author":"Shaalan K.","year":"2006"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP\u201907)","author":"Shaalan K."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85287-2_42"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.v60:8"},{"key":"e_1_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Soudi A. Bosch A. and G\u00fcnter N. eds. 2007. Arabic Computational Morphology: Knowledge-Based and Empirical Methods (Text Speech and Language Technology) Springer. Soudi A. Bosch A. and G\u00fcnter N. eds. 2007. Arabic Computational Morphology: Knowledge-Based and Empirical Methods (Text Speech and Language Technology) Springer.","DOI":"10.1007\/978-1-4020-6046-5"},{"key":"e_1_2_1_55_1","unstructured":"Versteegh K. 1997. The Arabic Language. Columbia University Press New York. Versteegh K. 1997. The Arabic Language . Columbia University Press New York."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644879.1644881","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1644879.1644881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:41:18Z","timestamp":1750250478000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644879.1644881"}},"subtitle":["Challenges and Solutions"],"short-title":[],"issued":{"date-parts":[[2009,12]]},"references-count":55,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["10.1145\/1644879.1644881"],"URL":"https:\/\/doi.org\/10.1145\/1644879.1644881","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"value":"1530-0226","type":"print"},{"value":"1558-3430","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,12]]},"assertion":[{"value":"2009-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}