{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:33:02Z","timestamp":1750307582831,"version":"3.41.0"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2009,12,1]],"date-time":"2009-12-01T00:00:00Z","timestamp":1259625600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:p>\n            The Arabic language has a very rich\/complex morphology. Each Arabic word is composed of zero or more\n            <jats:italic>prefixes<\/jats:italic>\n            , one\n            <jats:italic>stem<\/jats:italic>\n            and zero or more\n            <jats:italic>suffixes<\/jats:italic>\n            . Consequently, the Arabic data is sparse compared to other languages such as English, and it is necessary to conduct word segmentation before any natural language processing task. Therefore, the word-segmentation step is worth a deeper study since it is a preprocessing step which shall have a significant impact on all the steps coming afterward. In this article, we present an Arabic mention detection system that has very competitive results in the recent Automatic Content Extraction (ACE) evaluation campaign. We investigate the impact of different segmentation schemes on Arabic mention detection systems and we show how these systems may benefit from more than one segmentation scheme. We report the performance of several mention detection models using different kinds of possible and known segmentation schemes for Arabic text: punctuation separation, Arabic Treebank, and morphological and character-level segmentations. We show that the combination of competitive segmentation styles leads to a better performance. Results indicate a statistically significant improvement when Arabic Treebank and morphological segmentations are combined.\n          <\/jats:p>","DOI":"10.1145\/1644879.1644883","type":"journal-article","created":{"date-parts":[[2010,1,5]],"date-time":"2010-01-05T15:05:08Z","timestamp":1262703908000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Morphology-Based Segmentation Combination for Arabic Mention Detection"],"prefix":"10.1145","volume":"8","author":[{"given":"Yassine","family":"Benajiba","sequence":"first","affiliation":[{"name":"Center for Computational Learning Systems, Columbia University"}]},{"given":"Imed","family":"Zitouni","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center"}]}],"member":"320","published-online":{"date-parts":[[2009,12]]},"reference":[{"volume-title":"Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908)","author":"Benajiba Y.","key":"e_1_2_1_1_1","unstructured":"Benajiba , Y. , Diab , M. , and Rosso , P . 2008. Arabic named entity recognition using optimized feature sets . In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908) . Benajiba, Y., Diab, M., and Rosso, P. 2008. Arabic named entity recognition using optimized feature sets. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/234285.234289"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.817452"},{"key":"e_1_2_1_4_1","unstructured":"Chen S. F. and Goodman J. 1998. An empirical study of smoothing techniques for language modeling. Tech. rep. TR-10-98 Center for Research in Computing Technology Harvard University. Chen S. F. and Goodman J. 1998. An empirical study of smoothing techniques for language modeling. Tech. rep. TR-10-98 Center for Research in Computing Technology Harvard University."},{"volume-title":"Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics (HLT\/NAACL\u201904)","author":"Diab M.","key":"e_1_2_1_5_1","unstructured":"Diab , M. , Hacioglu , K. , and Jurafsky , D . 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks . In Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics (HLT\/NAACL\u201904) . Diab, M., Hacioglu, K., and Jurafsky, D. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics (HLT\/NAACL\u201904)."},{"volume-title":"Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics (HLT\/NAACL\u201904)","author":"Florian R.","key":"e_1_2_1_6_1","unstructured":"Florian , R. , Hassan , H. , Ittycheriah , A. , Jing , H. , Kambhatla , N. , Luo , X. , Nicolov , N. , and Roukos , S . 2004. A statistical model for multilingual entity detection and tracking . In Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics (HLT\/NAACL\u201904) . Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, N., and Roukos, S. 2004. A statistical model for multilingual entity detection and tracking. In Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics (HLT\/NAACL\u201904)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073086"},{"key":"e_1_2_1_8_1","unstructured":"Graff D. 2003. Arabic gigaword. http:\/\/www.ldc.upenn.edu\/. Graff D. 2003. Arabic gigaword. http:\/\/www.ldc.upenn.edu\/."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220176"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119355.1119381"},{"volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201901)","author":"Lafferty J.","key":"e_1_2_1_11_1","unstructured":"Lafferty , J. , McCallum , A. , and Pereira , F . 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data . In Proceedings of the International Conference on Machine Learning (ICML\u201901) . Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML\u201901)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075096.1075147"},{"volume-title":"Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools (NEMLAR\u201904)","author":"Maamouri M.","key":"e_1_2_1_13_1","unstructured":"Maamouri , M. , Bies , A. , Buckwalter , T. , and Mekki , W . 2004. The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus . In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools (NEMLAR\u201904) . Maamouri, M., Bies, A., Buckwalter, T., and Mekki, W. 2004. The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools (NEMLAR\u201904)."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 2nd International Workshop on Implementation and Application of Automata (CIAA\u201998)","volume":"1436","author":"Mohri M.","unstructured":"Mohri , M. , Pereira , F. C. N. , and Riley , M . 1998. A rational design for a weighted finite-state transducer library . In Proceedings of the 2nd International Workshop on Implementation and Application of Automata (CIAA\u201998) . D. Wood and S. Yu, Eds. Lecture Notes in Computer Science , vol. 1436 . Springer-Verlag: Berlin. 144--158. Mohri, M., Pereira, F. C. N., and Riley, M. 1998. A rational design for a weighted finite-state transducer library. In Proceedings of the 2nd International Workshop on Implementation and Application of Automata (CIAA\u201998). D. Wood and S. Yu, Eds. Lecture Notes in Computer Science, vol. 1436. Springer-Verlag: Berlin. 144--158."},{"key":"e_1_2_1_15_1","unstructured":"NIST. 2007. The ACE evaluation plan. www.nist.gov\/speech\/tests\/ace\/index.htm. NIST . 2007. The ACE evaluation plan. www.nist.gov\/speech\/tests\/ace\/index.htm."},{"volume-title":"Computer-Intensive Methods for Testing Hypotheses","author":"Noreen E. W.","key":"e_1_2_1_16_1","unstructured":"Noreen , E. W. 1989. Computer-Intensive Methods for Testing Hypotheses . John Wiley & Sons . Noreen, E. W. 1989. Computer-Intensive Methods for Testing Hypotheses. John Wiley & Sons."},{"volume-title":"Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language (ACL\u201994)","author":"Ramshaw L.","key":"e_1_2_1_17_1","unstructured":"Ramshaw , L. and Marcus , M . 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging . In Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language (ACL\u201994) . 128--135. Ramshaw, L. and Marcus, M. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language (ACL\u201994). 128--135."},{"volume-title":"Proceedings of the 3rd Workshop on Very Large Corpora (WVLC\u201995)","author":"Ramshaw L.","key":"e_1_2_1_18_1","unstructured":"Ramshaw , L. and Marcus , M . 1995. Text chunking using transformation-based learning . In Proceedings of the 3rd Workshop on Very Large Corpora (WVLC\u201995) . D. Yarowsky and K. Church, Eds. Association for Computational Linguistics, 82--94. Ramshaw, L. and Marcus, M. 1995. Text chunking using transformation-based learning. In Proceedings of the 3rd Workshop on Very Large Corpora (WVLC\u201995). D. Yarowsky and K. Church, Eds. Association for Computational Linguistics, 82--94."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118853.1118877"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/977035.977059"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073478"},{"volume-title":"Proceedings of the ACL Workshop on Computing Approaches to Semitic Languages (CASL\u201905)","author":"Zitouni I.","key":"e_1_2_1_22_1","unstructured":"Zitouni , I. , Sorensen , J. , Luo , X. , and Florian , R . 2005. The impact of morphological stemming on Arabic mention detection and conference resolution . In Proceedings of the ACL Workshop on Computing Approaches to Semitic Languages (CASL\u201905) . Zitouni, I., Sorensen, J., Luo, X., and Florian, R. 2005. The impact of morphological stemming on Arabic mention detection and conference resolution. In Proceedings of the ACL Workshop on Computing Approaches to Semitic Languages (CASL\u201905)."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644879.1644883","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1644879.1644883","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:41:18Z","timestamp":1750250478000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1644879.1644883"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,12]]},"references-count":22,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["10.1145\/1644879.1644883"],"URL":"https:\/\/doi.org\/10.1145\/1644879.1644883","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2009,12]]},"assertion":[{"value":"2009-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}