{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:26:41Z","timestamp":1775874401088,"version":"3.50.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2015,4,20]],"date-time":"2015-04-20T00:00:00Z","timestamp":1429488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2015,4,20]]},"abstract":"<jats:p>The rapid growth of the Internet and other computing facilities in recent years has resulted in the creation of a large amount of text in electronic form, which has increased the interest in and importance of different automatic text processing applications, including keyword extraction and term indexing. Although keywords are very useful for many applications, most documents available online are not provided with keywords. We describe a method for extracting keywords from Arabic documents. This method identifies the keywords by combining linguistics and statistical analysis of the text without using prior knowledge from its domain or information from any related corpus. The text is preprocessed to extract the main linguistic information, such as the roots and morphological patterns of derivative words. A cleaning phase is then applied to eliminate the meaningless words from the text. The most frequent terms are clustered into equivalence classes in which the derivative words generated from the same root and the non-derivative words generated from the same stem are placed together, and their count is accumulated. A vector space model is then used to capture the most frequent N-gram in the text. Experiments carried out using a real-world dataset show that the proposed method achieves good results with an average precision of 31% and average recall of 53% when tested against manually assigned keywords.<\/jats:p>","DOI":"10.1145\/2665077","type":"journal-article","created":{"date-parts":[[2015,6,18]],"date-time":"2015-06-18T18:14:05Z","timestamp":1434651245000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["Keyword Extraction from Arabic Documents using Term Equivalence Classes"],"prefix":"10.1145","volume":"14","author":[{"given":"Arafat","family":"Awajan","sequence":"first","affiliation":[{"name":"Princess Sumaya University for Technology, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,4,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10368"},{"key":"e_1_2_1_2_1","first-page":"188","article-title":"Multilayer model for Arabic text compression","volume":"8","author":"Awajan A.","year":"2011","journal-title":"Int. Arab J. Inform. Technol."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/992628.992647"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the International Arab Conference on Information Technology. http:\/\/www.itpapers.info\/acit10\/Papers\/f653","author":"Boudlal A."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199504)46:3%3C162::AID-ASI2%3E3.0.CO;2-6"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Diab M. Hacioglu K. and Jurafsky D. 2007. Automatic processing of modern standard Arabic text. In Arabic Computational Morphology. Springer 159--179.  Diab M. Hacioglu K. and Jurafsky D. 2007. Automatic processing of modern standard Arabic text. In Arabic Computational Morphology . Springer 159--179.","DOI":"10.1007\/978-1-4020-6046-5_9"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2008.05.002"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 2nd International Conference on Arabic Language Resources and Tools. The MEDAR Consortium.","author":"El-Shishtawy T."},{"key":"e_1_2_1_9_1","unstructured":"ESCWA. 2012. Status of the digital Arabic content industry in the Arab region. Economic and Social Commission for Western Asia-United Nations. http:\/\/www.escwa.un.org\/information\/publications\/edit\/upload\/E_ESCWA_ICTD_12_TP-4_E.pdf.  ESCWA. 2012. Status of the digital Arabic content industry in the Arab region. Economic and Social Commission for Western Asia-United Nations. http:\/\/www.escwa.un.org\/information\/publications\/edit\/upload\/E_ESCWA_ICTD_12_TP-4_E.pdf."},{"key":"e_1_2_1_10_1","unstructured":"Giarlo M. J. 2006. A comparative analysis of keyword extraction techniques Rutgers University. http:\/\/lackoftalent.org\/michael\/papers\/596.pdf.  Giarlo M. J. 2006. A comparative analysis of keyword extraction techniques Rutgers University. http:\/\/lackoftalent.org\/michael\/papers\/596.pdf."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics (COLING). 394--402","author":"Green S."},{"key":"e_1_2_1_12_1","unstructured":"Habash N. Y. 2012. Introduction to Arabic Language Processing. Morgan and Claypool.   Habash N. Y. 2012. Introduction to Arabic Language Processing . Morgan and Claypool."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Habash N. Soudi A. and Buckwalter T. 2007. On Arabic transliteration. In Arabic Computational Morphology. Springer. 15--22.  Habash N. Soudi A. and Buckwalter T. 2007. On Arabic transliteration. In Arabic Computational Morphology . Springer. 15--22.","DOI":"10.1007\/978-1-4020-6046-5_2"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199710)48:10%3C867::AID-ASI3%3E3.3.CO;2-R"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119355.1119383"},{"key":"e_1_2_1_16_1","unstructured":"Hulth A. 2004. Combining machine learning and natural language processing for automatic keyword extraction. Ph.D. Dissertation Department of Computer and Systems Sciences Stockholm University.  Hulth A. 2004. Combining machine learning and natural language processing for automatic keyword extraction. Ph.D. Dissertation Department of Computer and Systems Sciences Stockholm University."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026526"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. 257--266","author":"Liu Z."},{"key":"e_1_2_1_19_1","unstructured":"Manning C. D. Raghavan P. and Schtze H. 2009. An Introduction to Information Retrieval. Cambridge University Press UK.   Manning C. D. Raghavan P. and Schtze H. 2009. An Introduction to Information Retrieval . Cambridge University Press UK."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213004001466"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of EMNLP. Association for Computational Linguistics. 404--411","author":"Mihalcea R."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Rose S. Engel D. Cramer N. and Cowley W. 2010. Automatic keyword extraction from individual documents. In Text Mining: Applications and Theory M. W. Berry and J. Kogan (Eds.). John Wiley & Sons. 3--20.  Rose S. Engel D. Cramer N. and Cowley W. 2010. Automatic keyword extraction from individual documents. In Text Mining: Applications and Theory M. W. Berry and J. Kogan (Eds.). John Wiley & Sons. 3--20.","DOI":"10.1002\/9780470689646.ch1"},{"key":"e_1_2_1_23_1","unstructured":"Saad M. 2011. Arabic Corpora. http:\/\/sourceforge.net\/projects\/ar-textmining\/files\/Arbic-Corpora\/. (Last accessed 5\/13).  Saad M. 2011. Arabic Corpora. http:\/\/sourceforge.net\/projects\/ar-textmining\/files\/Arbic-Corpora\/. (Last accessed 5\/13)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/361219.361220"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/313238.313437"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2665077","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2665077","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:25Z","timestamp":1750227205000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2665077"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,4,20]]},"references-count":25,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,4,20]]}},"alternative-id":["10.1145\/2665077"],"URL":"https:\/\/doi.org\/10.1145\/2665077","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,4,20]]},"assertion":[{"value":"2013-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-04-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}