{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:21:10Z","timestamp":1750306870787,"version":"3.41.0"},"reference-count":15,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,2,1]],"date-time":"2014-02-01T00:00:00Z","timestamp":1391212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006785","name":"Google","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004963","name":"Seventh Framework Programme","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004963","id-type":"DOI","asserted-by":"publisher"}]},{"name":"SIERA Project"},{"DOI":"10.13039\/100013011","name":"Birzeit University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100013011","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2014,2]]},"abstract":"<jats:p>This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea then moving forward to more complex ones. We applied tests and filtration criteria to reach the best and most efficient results that our algorithm can achieve. The categorization depends on the statistical relations between the input (test) text and the reference (training) data supported by well-defined Wikipedia-based categories. Our algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. We argue that our algorithm achieved good performance compared to other methods reported in the literature.<\/jats:p>","DOI":"10.1145\/2537129","type":"journal-article","created":{"date-parts":[[2014,3,4]],"date-time":"2014-03-04T13:24:59Z","timestamp":1393939499000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Arabic Text Categorization Based on Arabic Wikipedia"],"prefix":"10.1145","volume":"13","author":[{"given":"Adnan","family":"Yahya","sequence":"first","affiliation":[{"name":"Birzeit University"}]},{"given":"Ali","family":"Salhi","sequence":"additional","affiliation":[{"name":"Birzeit University"}]}],"member":"320","published-online":{"date-parts":[[2014,2]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"S. Al-Harbi S. Almuhareb A. Al-Thubaity and M.-S. Khorsheed. 2008. Automatic Arabic text classification. 9es Journ\u00e9es internationales d\u2019Analyse statistique des Donn\u00e9es Textuelles. http:\/\/lexicometrica.univ-paris3.fr\/jadt\/jadt2008\/pdf\/harbi-almuhareb-thubaity-khorsheed-rajeh.pdf.  S. Al-Harbi S. Almuhareb A. Al-Thubaity and M.-S. Khorsheed. 2008. Automatic Arabic text classification. 9es Journ\u00e9es internationales d\u2019Analyse statistique des Donn\u00e9es Textuelles . http:\/\/lexicometrica.univ-paris3.fr\/jadt\/jadt2008\/pdf\/harbi-almuhareb-thubaity-khorsheed-rajeh.pdf."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 4th International Multi-Conference on Computer Science and Information Technology.","volume":"4","author":"Al-Shalabi R.","year":"2006","unstructured":"R. Al-Shalabi , G. Kanaan , and G.-H. Manaf . 2006 . Arabic text categorization using kNN algorithm . In Proceedings of the 4th International Multi-Conference on Computer Science and Information Technology. Vol. 4 . http:\/\/www.uop.edu.jo\/download\/research\/members\/CSIT2006\/vol4&percnt;20pdf\/pg20.pdf. R. Al-Shalabi, G. Kanaan, and G.-H. Manaf. 2006. Arabic text categorization using kNN algorithm. In Proceedings of the 4th International Multi-Conference on Computer Science and Information Technology. Vol. 4. http:\/\/www.uop.edu.jo\/download\/research\/members\/CSIT2006\/vol4&percnt;20pdf\/pg20.pdf."},{"key":"e_1_2_1_3_1","first-page":"2","article-title":"Automated Arabic text categorization using SVM and NB","volume":"2","author":"Alsaleem S.","year":"2011","unstructured":"S. Alsaleem . 2011 . Automated Arabic text categorization using SVM and NB . Int. Arab J. e-Technol. 2 , 2 . http:\/\/www.iajet.org\/iajet_files\/vol.2\/no.2\/Automated&percnt;20Arabic&percnt;20Text&percnt;20Categorization&percnt;20Using&percnt;20SVM&percnt;20and&percnt;20NB_doc.pdf. S. Alsaleem. 2011. Automated Arabic text categorization using SVM and NB. Int. Arab J. e-Technol. 2, 2. http:\/\/www.iajet.org\/iajet_files\/vol.2\/no.2\/Automated&percnt;20Arabic&percnt;20Text&percnt;20Categorization&percnt;20Using&percnt;20SVM&percnt;20and&percnt;20NB_doc.pdf.","journal-title":"Int. Arab J. e-Technol."},{"key":"e_1_2_1_4_1","volume-title":"2nd International Workshop, C. Mahlow and M. Piotrowski Eds., http:\/\/sourceforge.net\/projects\/arabicwordcount\/.","author":"Attia M.","year":"2011","unstructured":"M. Attia , P. Pecina , L. Tounsi , A. Toral , and J.-V. Genabith . 2011 . A lexical database for modern standard Arabic interoperable with a finite state morphological transducer. In Systems and Frameworks for Computational Morphology , 2nd International Workshop, C. Mahlow and M. Piotrowski Eds., http:\/\/sourceforge.net\/projects\/arabicwordcount\/. M. Attia, P. Pecina, L. Tounsi, A. Toral, and J.-V. Genabith. 2011. A lexical database for modern standard Arabic interoperable with a finite state morphological transducer. In Systems and Frameworks for Computational Morphology, 2nd International Workshop, C. Mahlow and M. Piotrowski Eds., http:\/\/sourceforge.net\/projects\/arabicwordcount\/."},{"volume-title":"Proceedings of the 20th International Conference on Computational Linguistics. http:\/\/acl.ldc.upenn.edu\/W\/W04\/W04-1610","author":"El-Kourdi M.","key":"e_1_2_1_5_1","unstructured":"M. El-Kourdi , A. Bensaid , and T. Rachidi . 2004. Automatic Arabic document categorization based on the na\u00efve Bayes algorithm . In Proceedings of the 20th International Conference on Computational Linguistics. http:\/\/acl.ldc.upenn.edu\/W\/W04\/W04-1610 .pdf. M. El-Kourdi, A. Bensaid, and T. Rachidi. 2004. Automatic Arabic document categorization based on the na\u00efve Bayes algorithm. In Proceedings of the 20th International Conference on Computational Linguistics. http:\/\/acl.ldc.upenn.edu\/W\/W04\/W04-1610.pdf."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the Arabic NLP Workshop at ACL\/EACL. 73--79","author":"Goweder A.","year":"2020","unstructured":"A. Goweder and A. De Roeck . 2001. Assessment of a significant Arabic corpus . In Proceedings of the Arabic NLP Workshop at ACL\/EACL. 73--79 . http:\/\/www.abdelali.net\/ref\/ACL-EACL&percnt; 2020 01_goweder.pdf. A. Goweder and A. De Roeck. 2001. Assessment of a significant Arabic corpus. In Proceedings of the Arabic NLP Workshop at ACL\/EACL. 73--79. http:\/\/www.abdelali.net\/ref\/ACL-EACL&percnt;202001_goweder.pdf."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 6th Conference on Data. http:\/\/ww1.ucmss.com\/books\/LFS\/CSREA2006\/DMI5552","author":"Khreisat L.","year":"2006","unstructured":"L. Khreisat . 2006 . Arabic text classification using N-gram frequency statistics: A comparative study . In Proceedings of the 6th Conference on Data. http:\/\/ww1.ucmss.com\/books\/LFS\/CSREA2006\/DMI5552 .pdf. L. Khreisat. 2006. Arabic text classification using N-gram frequency statistics: A comparative study. In Proceedings of the 6th Conference on Data. http:\/\/ww1.ucmss.com\/books\/LFS\/CSREA2006\/DMI5552.pdf."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4020-6046-5_12"},{"key":"e_1_2_1_9_1","first-page":"598","article-title":"Hierarchical text classification with latent concepts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics","volume":"2","author":"Qiu X.","year":"2011","unstructured":"X. Qiu , X. Huang , Z. Liu , and J. Zhou . 2011 . Hierarchical text classification with latent concepts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies: Short Papers. vol. 2. 598 -- 602 . https:\/\/www.aclweb.org\/anthology\/P\/P11\/P11-2105.pdf. X. Qiu, X. Huang, Z. Liu, and J. Zhou. 2011. Hierarchical text classification with latent concepts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers. vol. 2. 598--602. https:\/\/www.aclweb.org\/anthology\/P\/P11\/P11-2105.pdf.","journal-title":"Human Language Technologies: Short Papers."},{"key":"e_1_2_1_10_1","volume-title":"The impact of text preprocessing and term weighting on Arabic text classification. Master\u2019s thesis","author":"Saad M.","year":"1986","unstructured":"M. Saad . 2011. The impact of text preprocessing and term weighting on Arabic text classification. Master\u2019s thesis , Faculty of Engineering, The Islamic University, Gaza , Palestinian Territories . http:\/\/library.iugaza.edu.ps\/thesis\/9 1986 .pdf. M. Saad. 2011. The impact of text preprocessing and term weighting on Arabic text classification. Master\u2019s thesis, Faculty of Engineering, The Islamic University, Gaza, Palestinian Territories. http:\/\/library.iugaza.edu.ps\/thesis\/91986.pdf."},{"key":"e_1_2_1_11_1","unstructured":"A. Sarkar A. De Roeck and P. Garthwaite. 2004. Easy measures for evaluating non-English corpora for language engineering: Some lessons from Arabic and Bengali. Tech. rep. Department of Computing Open University. http:\/\/computing-reports.open.ac.uk\/2004\/2004_05.pdf.  A. Sarkar A. De Roeck and P. Garthwaite. 2004. Easy measures for evaluating non-English corpora for language engineering: Some lessons from Arabic and Bengali. Tech. rep. Department of Computing Open University. http:\/\/computing-reports.open.ac.uk\/2004\/2004_05.pdf."},{"key":"e_1_2_1_12_1","first-page":"221","article-title":"An intelligent system for Arabic text categorization","volume":"38","author":"Syiam M. M.","year":"2006","unstructured":"M. M. Syiam , Z. T. Fayed , and M. B. Habib 2006 . An intelligent system for Arabic text categorization . Int. J. Intell. Comput. Inform. Sci. 38 , 221 -- 243 . http: http:\/\/eprints.eemcs.utwente.nl\/19190\/01\/IJICIS2006.pdf. M. M. Syiam, Z. T. Fayed, and M. B. Habib 2006. An intelligent system for Arabic text categorization. Int. J. Intell. Comput. Inform. Sci. 38, 221--243. http: http:\/\/eprints.eemcs.utwente.nl\/19190\/01\/IJICIS2006.pdf.","journal-title":"Int. J. Intell. Comput. Inform. Sci."},{"volume-title":"Proceedings of the 7th International Conference on Innovations in Information Technology. http:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=5893871&isnumber=5893793","author":"Yahya A.","key":"e_1_2_1_13_1","unstructured":"A. Yahya and A. Salhi . 2011. Enhancement tools for Arabic Web search: A statistical approach . In Proceedings of the 7th International Conference on Innovations in Information Technology. http:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=5893871&isnumber=5893793 . A. Yahya and A. Salhi. 2011. Enhancement tools for Arabic Web search: A statistical approach. In Proceedings of the 7th International Conference on Innovations in Information Technology. http:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=5893871&isnumber=5893793."},{"key":"e_1_2_1_14_1","unstructured":"A. Yahya and A. Salhi. 2012. Arabic text correction using dynamic categorized dictionaries: A statistical approach. Linguistica Commun. J. 5.  A. Yahya and A. Salhi. 2012. Arabic text correction using dynamic categorized dictionaries: A statistical approach. Linguistica Commun. J. 5 ."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860455"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2537129","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2537129","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:18:13Z","timestamp":1750234693000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2537129"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2]]},"references-count":15,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,2]]}},"alternative-id":["10.1145\/2537129"],"URL":"https:\/\/doi.org\/10.1145\/2537129","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2014,2]]},"assertion":[{"value":"2013-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-02-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}