{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T00:47:58Z","timestamp":1775004478468,"version":"3.50.1"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T00:00:00Z","timestamp":1544745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Lebanese National Council for Scientific Research"},{"DOI":"10.13039\/100013011","name":"Birzeit University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100013011","id-type":"DOI","asserted-by":"crossref"}]},{"name":"VerbMesh project, funded by BZU research committee"},{"name":"Google's Faculty Research Award"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2019,6,30]]},"abstract":"<jats:p>\n            Words in Arabic consist of letters and short vowel symbols called diacritics inscribed atop regular letters. Changing diacritics may change the syntax and semantics of a word; turning it into another. This results in difficulties when comparing words based solely on string matching. Typically, Arabic NLP applications resort to morphological analysis to battle ambiguity originating from this and other challenges. In this article, we introduce three alternative algorithms to compare two words with possibly different diacritics. We propose the\n            <jats:italic>Subsume<\/jats:italic>\n            knowledge-based algorithm, the\n            <jats:italic>Imply<\/jats:italic>\n            rule-based algorithm, and the\n            <jats:italic>Alike<\/jats:italic>\n            machine-learning-based algorithm. We evaluated the soundness, completeness, and accuracy of the algorithms against a large dataset of 86,886 word pairs. Our evaluation shows that the accuracy of Subsume (100%), Imply (99.32%), and Alike (99.53%). Although accurate, Subsume was able to judge only 75% of the data. Both Subsume and Imply are sound, while Alike is not. We demonstrate the utility of the algorithms using a real-life use case -- in lemma disambiguation and in linking hundreds of Arabic dictionaries.\n          <\/jats:p>","DOI":"10.1145\/3242177","type":"journal-article","created":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T13:19:17Z","timestamp":1544793557000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Diacritic-Based Matching of Arabic Words"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4351-4207","authenticated-orcid":false,"given":"Mustafa","family":"Jarrar","sequence":"first","affiliation":[{"name":"Birzeit University, West Bank, Palestine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fadi","family":"Zaraket","sequence":"additional","affiliation":[{"name":"American University, Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rami","family":"Asia","sequence":"additional","affiliation":[{"name":"Birzeit University, West Bank, Palestine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hamzeh","family":"Amayreh","sequence":"additional","affiliation":[{"name":"Birzeit University, West Bank, Palestine"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,12,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/AICCSA.2017.162"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCES.2008.4772979"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324913000284"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDIM.2012.6360097"},{"key":"e_1_2_1_6_1","volume-title":"ACL Workshop on Arabic Language Processing: Status and Perspective 1, 1--8.","author":"Beesley Kenneth","year":"2001","unstructured":"Kenneth Beesley . 2001 . Finite-state morphological analysis and generation of Arabic at Xerox research: Status and plans . In ACL Workshop on Arabic Language Processing: Status and Perspective 1, 1--8. Kenneth Beesley. 2001. Finite-state morphological analysis and generation of Arabic at Xerox research: Status and plans. In ACL Workshop on Arabic Language Processing: Status and Perspective 1, 1--8."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.","author":"Belinkov Y.","unstructured":"Y. Belinkov and J. Glass . 2015. Arabic diacritization with recurrent neural networks . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Y. Belinkov and J. Glass. 2015. Arabic diacritization with recurrent neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of ACIT","author":"Boujelben Makram","year":"2008","unstructured":"Makram Boujelben , Chafik Aloulou , and Lamia Hadrich Belguith . 2008 . Toward a robust detection\/correction system for the agreement errors in non-voweled arabic texts . In Proceedings of ACIT 2008. Makram Boujelben, Chafik Aloulou, and Lamia Hadrich Belguith. 2008. Toward a robust detection\/correction system for the agreement errors in non-voweled arabic texts. In Proceedings of ACIT 2008."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 3rd Arabic Natural Language Processing Workshop, 9--17","author":"Darwish Kareem","unstructured":"Kareem Darwish , Hamdy Mubarak , and A. Abdelali . 2017. Arabic diacritization: Stats, rules, and hacks . In Proceedings of the 3rd Arabic Natural Language Processing Workshop, 9--17 . Kareem Darwish, Hamdy Mubarak, and A. Abdelali. 2017. Arabic diacritization: Stats, rules, and hacks. In Proceedings of the 3rd Arabic Natural Language Processing Workshop, 9--17."},{"key":"e_1_2_1_11_1","unstructured":"Fathi Debili Hadh\u00e9mi Achour and E. Souissi. 2002. De l'\u00e9tiquetage grammatical \u00e0 la voyellation automatique de l'arabe. Technical Report.  Fathi Debili Hadh\u00e9mi Achour and E. Souissi. 2002. De l'\u00e9tiquetage grammatical \u00e0 la voyellation automatique de l'arabe. Technical Report."},{"key":"e_1_2_1_12_1","volume-title":"Arabic Computational Morphology","author":"Habash Nizar","unstructured":"Nizar Habash . 2007. Arabic morphological representations for machine translation. book chapter . In Arabic Computational Morphology . Springer , 263\u2014285. Nizar Habash. 2007. Arabic morphological representations for machine translation. book chapter. In Arabic Computational Morphology. Springer, 263\u2014285."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of MEDAR\u201909","author":"Habash Nizar","year":"2007","unstructured":"Nizar Habash , Owen Rambow , and Ryan Roth . 2007 . MADA+ TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization . In Proceedings of MEDAR\u201909 . Nizar Habash, Owen Rambow, and Ryan Roth. 2007. MADA+ TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of MEDAR\u201909."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.7813\/2075-4124.2012\/4-4\/A.7"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-016-9370-7"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3603"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Experts Meeting on Arabic Ontologies and Semantic Networks. ALESCO, Arab League.","author":"Jarrar Mustafa","year":"2011","unstructured":"Mustafa Jarrar . 2011 . Building a formal Arabic ontology . In Proceedings of the Experts Meeting on Arabic Ontologies and Semantic Networks. ALESCO, Arab League. Mustafa Jarrar. 2011. Building a formal Arabic ontology. In Proceedings of the Experts Meeting on Arabic Ontologies and Semantic Networks. ALESCO, Arab League."},{"key":"e_1_2_1_18_1","volume-title":"The IFIP International Symposium on Data-Driven Process Discovery and Analysis.","author":"Jarrar Mustafa","year":"2011","unstructured":"Mustafa Jarrar , Anton Deik , and Bilal Faraj . 2011 . Ontology-based data and process governance framework -the case of e-government interoperability in Palestine . The IFIP International Symposium on Data-Driven Process Discovery and Analysis. Mustafa Jarrar, Anton Deik, and Bilal Faraj. 2011. Ontology-based data and process governance framework -the case of e-government interoperability in Palestine. The IFIP International Symposium on Data-Driven Process Discovery and Analysis."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135850"},{"key":"e_1_2_1_20_1","first-page":"4","article-title":"Towards pattern-based reasoning for friendly ontology debugging","volume":"17","author":"Jarrar Mustafa","year":"2008","unstructured":"Mustafa Jarrar and Stijn Heymans . 2008 . Towards pattern-based reasoning for friendly ontology debugging . Journal of Artificial Intelligence Tools 17 , 4 , 2008. Mustafa Jarrar and Stijn Heymans. 2008. Towards pattern-based reasoning for friendly ontology debugging. Journal of Artificial Intelligence Tools 17, 4, 2008.","journal-title":"Journal of Artificial Intelligence Tools"},{"key":"e_1_2_1_21_1","volume-title":"JADT 2010: 10th International Conference on Statistical Analysis of Textual Data.","author":"Kammoun Nouha Cha\u00e2ben","year":"2010","unstructured":"Nouha Cha\u00e2ben Kammoun , Lamia Hadrich Belguith , and Abdelmajid Ben Hamadou . 2010 . The MORPH2 new version: A robust morphological analyzer for Arabic texts . JADT 2010: 10th International Conference on Statistical Analysis of Textual Data. Nouha Cha\u00e2ben Kammoun, Lamia Hadrich Belguith, and Abdelmajid Ben Hamadou. 2010. The MORPH2 new version: A robust morphological analyzer for Arabic texts. JADT 2010: 10th International Conference on Statistical Analysis of Textual Data."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.4236\/jsea.2012.512B024"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing, 3--5.","author":"Kiraz George Anton","year":"1998","unstructured":"George Anton Kiraz . 1998 . Arabic computational morphology in the west . In Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing, 3--5. George Anton Kiraz. 1998. Arabic computational morphology in the west. In Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing, 3--5."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of LREC\u20192010","author":"Kulick Seth","year":"2010","unstructured":"Seth Kulick , Ann Bies , and Mohamed Maamouri . 2010 . Consistent and flexible integration of morphological annotation in the Arabic treebank . In Proceedings of LREC\u20192010 . Seth Kulick, Ann Bies, and Mohamed Maamouri. 2010. Consistent and flexible integration of morphological annotation in the Arabic treebank. In Proceedings of LREC\u20192010."},{"key":"e_1_2_1_25_1","first-page":"4","article-title":"Hybrid approaches for the automatic vowelization of Arabic texts","volume":"3","author":"Mohamed B.","year":"2014","unstructured":"B. Mohamed , A. Chennoufi , A. Mazroui , and A. Lakhouaja . 2014 . Hybrid approaches for the automatic vowelization of Arabic texts . Natural Language Computing. 3 , 4 . B. Mohamed, A. Chennoufi, A. Mazroui, and A. Lakhouaja. 2014. Hybrid approaches for the automatic vowelization of Arabic texts. Natural Language Computing. 3, 4.","journal-title":"Natural Language Computing."},{"key":"e_1_2_1_26_1","volume-title":"Programs for Machine Learning","author":"Quinlan J. R.","unstructured":"J. R. Quinlan . 1993. C4.5 : Programs for Machine Learning . Morgan-Kaufmann Publishers . J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan-Kaufmann Publishers."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2010.2045240"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/1557690.1557721"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38824-8_5"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1621804.1621822"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2008.06.001"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3242177","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3242177","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:43:36Z","timestamp":1750207416000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3242177"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,14]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,6,30]]}},"alternative-id":["10.1145\/3242177"],"URL":"https:\/\/doi.org\/10.1145\/3242177","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,14]]},"assertion":[{"value":"2018-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}