{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T20:01:09Z","timestamp":1759694469381},"reference-count":46,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on a benchmark that we designed to reflect linguistic patterns in both Contemporary Arabic and Classical Arabic, the first being a cover term for written and spoken Modern Standard Arabic, while the second for pre-modern Arabic. The analogy items we included in this benchmark are chosen in a transparent manner such that they would capture the major features of nouns and verbs; derivational and inflectional morphology; high-, middle-, and low-frequency patterns and lexical items; and morphosemantic, morphosyntactic, and semantic dimensions of the language. All categories included in this benchmark are carefully selected to ensure proper representation of the language. The benchmark consists of 45 roots of the trilateral, all-consonantal, and semivowel-inclusive types; six morphosemantic patterns (\u2019af\u2018ala; ifta\u2018ala; infa\u2018ala; istaf\u2018ala; tafa\u2018\u2018ala; and taf\u0101\u2018ala); five derivations (the verbal noun, active participle, and the contrasts in Masculine-Feminine; Feminine-Singular-Plural; Masculine-Singular-Plural); and morphosyntactic transformations (perfect and imperfect verbs conjugated for all pronouns); and lexical semantics (synonyms, antonyms, and hyponyms of nouns, verbs, and adjectives), as well as capital cities and currencies. All categories include an equal proportion of high-, medium-, and low-frequency items. For the purpose of validating the proposed benchmark, we developed a set of embedding models from different textual sources. Then, we tested them intrinsically using the proposed benchmark and extrinsically using two natural language processing tasks: Arabic Named Entity Recognition and Text Classification. The evaluation leads to the conclusion that the proposed benchmark is truly reflective of this morphologically rich language and discriminatory of word embeddings.<\/jats:p>","DOI":"10.1017\/s1351324922000444","type":"journal-article","created":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T07:42:47Z","timestamp":1665992567000},"page":"978-1003","update-policy":"http:\/\/dx.doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":2,"title":["A benchmark for evaluating Arabic word embedding models"],"prefix":"10.1017","volume":"29","author":[{"given":"Sane","family":"Yagi","sequence":"first","affiliation":[]},{"given":"Ashraf","family":"Elnagar","sequence":"additional","affiliation":[]},{"given":"Shehdeh","family":"Fareh","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,10,17]]},"reference":[{"key":"S1351324922000444_ref42","unstructured":"Sibawayh, A.i.U. and Ya\u2018qub, I. (1999). al-Kitab. Dar al-Kutub al-Ilmiyah."},{"key":"S1351324922000444_ref44","unstructured":"Ul\u010dar, M. , Vaik, K. , Lindstr\u00f6m, J. , Dailid\u0117nait\u0117, M. and Robnik-\u0160ikonja, M. (2020). Multilingual culture-independent word analogy datasets. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association, pp. 4074\u20134080."},{"key":"S1351324922000444_ref45","doi-asserted-by":"crossref","DOI":"10.1075\/z.176","volume-title":"An Introduction to Linguistic Typology","author":"Velupillai","year":"2012"},{"key":"S1351324922000444_ref9","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113598"},{"key":"S1351324922000444_ref32","unstructured":"Mikolov, T. , Chen, K. , Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2\u20134, 2013, Workshop Track Proceedings."},{"key":"S1351324922000444_ref19","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1016\/j.ipm.2019.102121","article-title":"Arabic text classification using deep learning models","volume":"57","author":"Elnagar","year":"2020","journal-title":"Information Processing and Management"},{"key":"S1351324922000444_ref21","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.10.474"},{"key":"S1351324922000444_ref39","unstructured":"Romanov, M. and Seydi, M. (2019). OpenITI: A Machine-Readable Corpus of Islamicate Texts."},{"key":"S1351324922000444_ref2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.07.006"},{"key":"S1351324922000444_ref10","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8258460"},{"key":"S1351324922000444_ref13","unstructured":"Bolukbasi, T. , Chang, K.-W. , Zou, J. Y. , Saligrama, V. and Kalai, A.T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, pp. 4349\u20134357."},{"key":"S1351324922000444_ref5","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.01.006"},{"key":"S1351324922000444_ref28","doi-asserted-by":"publisher","DOI":"10.1109\/IALP51396.2020.9310507"},{"key":"S1351324922000444_ref20","first-page":"35","volume-title":"Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications","author":"Elnagar","year":"2018"},{"key":"S1351324922000444_ref46","doi-asserted-by":"crossref","first-page":"153","DOI":"10.33806\/ijaes2000.3.1.10","article-title":"Computerizing arabic morphology","volume":"3","author":"Yagi","year":"2002","journal-title":"International Journal of Arabic-English Studies"},{"key":"S1351324922000444_ref47","first-page":"430","volume-title":"International Conference on Intelligent Text Processing and Computational Linguistics","author":"Zahran","year":"2015"},{"key":"S1351324922000444_ref41","doi-asserted-by":"crossref","unstructured":"Schluter, N. (2018). The word analogy testing caveat. In Walker M.A., Ji H. and Stent A. (eds), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1\u20136, 2018, Volume 2 (Short Papers). Association for Computational Linguistics, pp. 242\u2013246.","DOI":"10.18653\/v1\/N18-2039"},{"key":"S1351324922000444_ref43","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2017.10.117"},{"key":"S1351324922000444_ref36","doi-asserted-by":"crossref","first-page":"106836","DOI":"10.1016\/j.asoc.2020.106836","article-title":"Deep learning for arabic subjective sentiment analysis: Challenges and research opportunities","volume":"98","author":"Nassif","year":"2021","journal-title":"Applied Soft Computing"},{"key":"S1351324922000444_ref38","doi-asserted-by":"crossref","unstructured":"Orabi, M. , El Rifai, H. and Elnagar, A. (2020). Classical arabic poetry: Classification based on era. In 2020 IEEE\/ACS 17th International Conference on Computer Systems and Applications (AICCSA). IEEE, pp. 1\u20136.","DOI":"10.1109\/AICCSA50499.2020.9316520"},{"volume-title":"al-Mujam al-Arabi: Dirasa Ihsaiya li-Dawaran al-Huruf fi al-Judhur al-Arabiya","year":"1983","author":"Alam","key":"S1351324922000444_ref7"},{"key":"S1351324922000444_ref17","doi-asserted-by":"publisher","DOI":"10.1016\/j.dib.2019.104076"},{"key":"S1351324922000444_ref23","doi-asserted-by":"crossref","first-page":"31010","DOI":"10.1109\/ACCESS.2021.3059504","article-title":"Systematic literature review of dialectal arabic: Identification and detection","volume":"9","author":"Elnagar","year":"2021","journal-title":"IEEE Access"},{"key":"S1351324922000444_ref25","doi-asserted-by":"crossref","first-page":"102438","DOI":"10.1016\/j.ipm.2020.102438","article-title":"A comparative study of effective approaches for arabic sentiment analysis","volume":"58","author":"Farha","year":"2021","journal-title":"Information Processing and Management"},{"key":"S1351324922000444_ref29","unstructured":"Khusainova, A. , Khan, A. and Rivera, A.R. (2019). Sart-similarity, analogies, and relatedness for tatar language: New benchmark datasets for word embeddings evaluation. arXiv preprint arXiv:1904.00365."},{"key":"S1351324922000444_ref16","doi-asserted-by":"publisher","DOI":"10.1109\/AICCSA47632.2019.9035362"},{"key":"S1351324922000444_ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.07.003"},{"key":"S1351324922000444_ref22","doi-asserted-by":"crossref","unstructured":"Elnagar, A. , Yagi, S. , Nassif, A.B. , Shahin, I. and Salloum, S.A. (2021a). Sentiment analysis in dialectal arabic: A systematic review. In International Conference on Advanced Machine Learning Technologies and Applications. Springer, pp. 407\u2013417.","DOI":"10.1007\/978-3-030-69717-4_39"},{"key":"S1351324922000444_ref15","unstructured":"Buckwalter, T. and Parkinson, D.L. (2011). A Frequency Dictionary of Arabic: Core Vocabulary for Learners . Routledge Frequency Dictionaries. London, New York: Routledge."},{"key":"S1351324922000444_ref31","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1062"},{"key":"S1351324922000444_ref33","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1016\/j.jksuci.2020.01.004","article-title":"Qsst: A quranic semantic search tool based on word embedding","volume":"34","author":"Mohamed","year":"2022","journal-title":"Journal of King Saud University - Computer and Information Sciences"},{"key":"S1351324922000444_ref27","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-015-9304-9"},{"key":"S1351324922000444_ref12","first-page":"143","volume-title":"International Conference on Intelligent Text Processing and Computational Linguistics","author":"Benajiba","year":"2007"},{"key":"S1351324922000444_ref1","doi-asserted-by":"crossref","unstructured":"Abbas, M. , Lichouri, M. and Zeggada, A. (2019). Classification of arabic poems: From the 5th to the 15th century. In Cristani, M., Prati, A., Lanz, O., Messelodi, S. and Sebe, N. (eds), New Trends in Image Analysis and Processing \u2013 ICIAP 2019. Springer International Publishing, pp. 179\u2013186.","DOI":"10.1007\/978-3-030-30754-7_18"},{"key":"S1351324922000444_ref24","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2072"},{"key":"S1351324922000444_ref4","first-page":"263","article-title":"A scalable shallow learning approach for tagging arabic news articles","volume":"6","author":"Al Qadi","year":"2020","journal-title":"Jordanian Journal of Computer and Information Technology (JJCIT)"},{"key":"S1351324922000444_ref11","unstructured":"Bakarov, A. (2018). A survey of word embeddings evaluation methods. CoRR, abs\/1801.09536."},{"key":"S1351324922000444_ref26","doi-asserted-by":"crossref","unstructured":"Gladkova, A. , Drozd, A. and Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: What works and what doesn\u2019t. In Proceedings of the NAACL Student Research Workshop, pp. 8\u201315.","DOI":"10.18653\/v1\/N16-2002"},{"key":"S1351324922000444_ref14","doi-asserted-by":"crossref","first-page":"102124","DOI":"10.1016\/j.ipm.2019.102124","article-title":"Building a morpho-semantic knowledge graph for arabic information retrieval","volume":"57","author":"Bounhas","year":"2020","journal-title":"Information Processing and Management"},{"key":"S1351324922000444_ref6","doi-asserted-by":"crossref","unstructured":"AL-Smadi, M. , Jaradat, Z. , AL-Ayyoub, M. and Jararweh, Y. (2017). Paraphrase identification and semantic text similarity analysis in arabic news tweets using lexical, syntactic, and semantic features. Information Processing & Management 53(3), 640\u2013652.","DOI":"10.1016\/j.ipm.2017.01.002"},{"key":"S1351324922000444_ref3","doi-asserted-by":"publisher","DOI":"10.1109\/ICTCS.2019.8923073"},{"key":"S1351324922000444_ref18","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-021-06390-z"},{"key":"S1351324922000444_ref30","unstructured":"K\u00f6per, M. , Scheible, C. and im Walde, S.S. (2015). Multilingual reliability and \u201csemantic\u201d structure of continuous word spaces. In Proceedings of the 11th International Conference on Computational Semantics, pp. 40\u201345."},{"key":"S1351324922000444_ref34","first-page":"1","article-title":"Empirical evaluation of shallow and deep learning classifiers for arabic sentiment analysis","volume":"21","author":"Nassif","year":"2021","journal-title":"Transactions on Asian and Low-Resource Language Information Processing"},{"key":"S1351324922000444_ref8","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.10.460"},{"key":"S1351324922000444_ref37","doi-asserted-by":"crossref","unstructured":"Nissim, M. , van Noord, R. and van der Goot, R. (2020). Fair is better than sensational: Man is to doctor as woman is to doctor.","DOI":"10.1162\/coli_a_00379"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324922000444","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,5]],"date-time":"2024-10-05T22:22:50Z","timestamp":1728166970000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324922000444\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,17]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["S1351324922000444"],"URL":"https:\/\/doi.org\/10.1017\/s1351324922000444","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2022,10,17]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}