{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T20:36:21Z","timestamp":1776285381131,"version":"3.50.1"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,6,9]],"date-time":"2021-06-09T00:00:00Z","timestamp":1623196800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Scientific Research and Innovation Support Fund"},{"name":"Ministry of Higher Education, Jordan"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2021,7,31]]},"abstract":"<jats:p>Measuring semantic similarity between short texts is an important task in many applications of natural language processing, such as paraphrasing identification. This process requires a benchmark of sentence pairs that are labeled by Arab linguists and considered a standard that can be used by researchers when evaluating their results. This research describes an Arabic paraphrasing benchmark to be a good standard for evaluation algorithms that are developed to measure semantic similarity for Arabic sentences to detect paraphrasing in the same language. The transformed sentences are in accordance with a set of rules for Arabic paraphrasing. These sentences are constructed from the words in the Arabic word semantic similarity dataset and from different Arabic books, educational texts, and lexicons. The proposed benchmark consists of 1,010 sentence pairs wherein each pair is tagged with scores determining semantic similarity and paraphrasing. The quality of the data is assessed using statistical analysis for the distribution of the sentences over the Arabic transformation rules and exploration through hierarchical clustering (HCL). Our exploration using HCL shows that the sentences in the proposed benchmark are grouped into 27 clusters representing different subjects. The inter-annotator agreement measures show a moderate agreement for the annotations of the graduate students and a poor reliability for the annotations of the undergraduate students.<\/jats:p>","DOI":"10.1145\/3446770","type":"journal-article","created":{"date-parts":[[2021,6,9]],"date-time":"2021-06-09T15:06:45Z","timestamp":1623251205000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Building Arabic Paraphrasing Benchmark based on Transformation Rules"],"prefix":"10.1145","volume":"20","author":[{"given":"Marwah","family":"Alian","sequence":"first","affiliation":[{"name":"Princess Sumaya University for Technology, Amman, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arafat","family":"Awajan","sequence":"additional","affiliation":[{"name":"Princess Sumaya University for Technology, Amman, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmad","family":"Al-Hasan","sequence":"additional","affiliation":[{"name":"Hashemite University, Zarqa, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raeda","family":"Akuzhia","sequence":"additional","affiliation":[{"name":"Hashemite University, Zarqa, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"2013 International Conference on Recent Trends in Information Technology (ICRTIT). 472\u2013477","author":"Vaishnavi V.","unstructured":"V. Vaishnavi , Madhesh Saritha , and S. Milton Rajendram . 2013. Paraphrase identification in short texts using grammar patterns . In 2013 International Conference on Recent Trends in Information Technology (ICRTIT). 472\u2013477 . V. Vaishnavi, Madhesh Saritha, and S. Milton Rajendram. 2013. Paraphrase identification in short texts using grammar patterns. In 2013 International Conference on Recent Trends in Information Technology (ICRTIT). 472\u2013477."},{"key":"e_1_2_1_2_1","volume-title":"11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics.","author":"Fernando Samuel","unstructured":"Samuel Fernando and Mark Stevenson . 2008. A semantic similarity approach to paraphrase detection . In 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics. Samuel Fernando and Mark Stevenson. 2008. A semantic similarity approach to paraphrase detection. In 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics."},{"key":"e_1_2_1_3_1","volume-title":"Paraphrase generation and information retrieval from stored text. Mechanical Translation and Computational Linguistics 11, 1 and 2","author":"Culicover Peter W.","year":"1968","unstructured":"Peter W. Culicover . 1968. Paraphrase generation and information retrieval from stored text. Mechanical Translation and Computational Linguistics 11, 1 and 2 ( 1968 ), 78\u201388. Peter W. Culicover. 1968. Paraphrase generation and information retrieval from stored text. Mechanical Translation and Computational Linguistics 11, 1 and 2 (1968), 78\u201388."},{"key":"e_1_2_1_4_1","volume-title":"International Workshop on Natural Language Processing for Social Media (SocialNLP\u201915)","author":"An Vo Ngoc Phuoc","year":"2015","unstructured":"Ngoc Phuoc An Vo , Simone Magnolini , and Octavian Popescu . 2015 . Paraphrase identification and semantic similarity in Twitter with simple features . In International Workshop on Natural Language Processing for Social Media (SocialNLP\u201915) , 10\u201319. Ngoc Phuoc An Vo, Simone Magnolini, and Octavian Popescu. 2015. Paraphrase identification and semantic similarity in Twitter with simple features. In International Workshop on Natural Language Processing for Social Media (SocialNLP\u201915), 10\u201319."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3844\/jcssp.2016.1.18","article-title":"Cross-language semantic similarity of Arabic-English short phrases and sentences","volume":"12","author":"Alzahrani Salha","year":"2016","unstructured":"Salha Alzahrani . 2016 . Cross-language semantic similarity of Arabic-English short phrases and sentences . Journal of Computer Sciences 12 , 1 (2016), 1 \u2013 18 . Salha Alzahrani. 2016. Cross-language semantic similarity of Arabic-English short phrases and sentences. Journal of Computer Sciences 12, 1 (2016), 1\u201318.","journal-title":"Journal of Computer Sciences"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/2387636.2387697"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACIT.2018.8672665"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJIIDS.2010.032437"},{"key":"e_1_2_1_9_1","volume-title":"Microsoft Research Paraphrase Corpus. (March","author":"Dolan Bill","year":"2005","unstructured":"Bill Dolan , Chris Brockett , and Chris Quirk . 2005. Microsoft Research Paraphrase Corpus. (March 2005 ). Microsoft Research . Bill Dolan, Chris Brockett, and Chris Quirk. 2005. Microsoft Research Paraphrase Corpus. (March 2005). Microsoft Research."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s40595-016-0080-2"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2001"},{"key":"e_1_2_1_12_1","volume-title":"Clear grammer of Arabic language\u2014AlnHw AlwADH fy qwAEd AllgAh AlErbyAh","author":"AlJarem Ali","unstructured":"Ali AlJarem and Mustafa Ameen . 2004. Clear grammer of Arabic language\u2014AlnHw AlwADH fy qwAEd AllgAh AlErbyAh . Al-Dar Almysria Alsuadia for Publishing . Ali AlJarem and Mustafa Ameen. 2004. Clear grammer of Arabic language\u2014AlnHw AlwADH fy qwAEd AllgAh AlErbyAh. Al-Dar Almysria Alsuadia for Publishing."},{"key":"e_1_2_1_13_1","unstructured":"Ahmad Mukhtar Omar. 1998. Semantics. Elm AldlAlAh. Book World. Qairo. Ahmad Mukhtar Omar. 1998. Semantics. Elm AldlAlAh. Book World. Qairo."},{"key":"e_1_2_1_14_1","unstructured":"Mohammad AlKholi. 2001. Semantics. Elm AldlAlAh (Elm AlmEnY). Dar Al-falah. Amman. Mohammad AlKholi. 2001. Semantics. Elm AldlAlAh (Elm AlmEnY). Dar Al-falah. Amman."},{"key":"e_1_2_1_15_1","volume-title":"Omar and others","author":"Ahmad","year":"1999","unstructured":"Ahmad M. Omar and others . 1999 . Language and grammar exercises. AltdrybAt AllgwyAh wAlqwAEd. Kuwait University\u2014Art Collage . Ahmad M. Omar and others. 1999. Language and grammar exercises. AltdrybAt AllgwyAh wAlqwAEd. Kuwait University\u2014Art Collage."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/SMC.2013.92"},{"key":"e_1_2_1_17_1","unstructured":"Mohammad AlKholi. 1999. Transformation rules for Arabic language. qwAEd tHwylyAh llgAh AlErbyAh. Dar Al-Falah. Amman. Mohammad AlKholi. 1999. Transformation rules for Arabic language. qwAEd tHwylyAh llgAh AlErbyAh. Dar Al-Falah. Amman."},{"key":"e_1_2_1_18_1","volume-title":"Syntactic Structure","author":"Chomsky Noam","unstructured":"Noam Chomsky . 1957. Syntactic Structure . Mouton Publishers , The Hague , Paris. Noam Chomsky. 1957. Syntactic Structure. Mouton Publishers, The Hague, Paris."},{"key":"e_1_2_1_19_1","unstructured":"Abdel Haleem Benaissa. 2011. Transfer Grammar in Arabic Phrase. Dar Al-Kotob Al-Ilmiyah Lebanon. Abdel Haleem Benaissa. 2011. Transfer Grammar in Arabic Phrase. Dar Al-Kotob Al-Ilmiyah Lebanon."},{"key":"e_1_2_1_20_1","volume-title":"El-Beltagy","author":"Soliman Mohammad Abu Bakr","year":"2017","unstructured":"Abu Bakr Soliman Mohammad , Kareem Eissa , and Samhaa R . El-Beltagy . 2017 . AraVec: A set of Arabic word embedding models for use in Arabic NLP. Procedia Computer Science 117, (2017) 256\u2013265. Abu Bakr Soliman Mohammad, Kareem Eissa, and Samhaa R. El-Beltagy. 2017. AraVec: A set of Arabic word embedding models for use in Arabic NLP. Procedia Computer Science 117, (2017) 256\u2013265."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0031619"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1080\/19312450709336664"},{"key":"e_1_2_1_23_1","first-page":"115","article-title":"Correlation and agreement: Overview and clarification of competing concepts and measures","volume":"28","author":"Liu Jinyuan","year":"2016","unstructured":"Jinyuan Liu , Wan Tang , Guanqin Chen , Yin Lu , Changyong Feng , and Xin M Tu . 2016 . Correlation and agreement: Overview and clarification of competing concepts and measures . Shanghai Arch Psychiatry 28 , 2 (2016), 115 \u2013 120 . Jinyuan Liu, Wan Tang, Guanqin Chen, Yin Lu, Changyong Feng, and Xin M Tu. 2016. Correlation and agreement: Overview and clarification of competing concepts and measures. Shanghai Arch Psychiatry 28, 2 (2016), 115\u2013120.","journal-title":"Shanghai Arch Psychiatry"},{"key":"e_1_2_1_24_1","volume-title":"Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing","author":"Sanborn Adrian","unstructured":"Adrian Sanborn and Jacek Skryzalin . 2015. Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing . Stanford, CA : Stanford University . Adrian Sanborn and Jacek Skryzalin. 2015. Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing. Stanford, CA: Stanford University."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.130"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368691.3368708"},{"key":"e_1_2_1_27_1","volume-title":"Interactive Clustering for Data Exploration","author":"Brandt Joel R.","unstructured":"Joel R. Brandt , Jiayi Chong , and Sean Rosenbaum . 2006. Interactive Clustering for Data Exploration . Stanford University , Stanford, CA . Joel R. Brandt, Jiayi Chong, and Sean Rosenbaum. 2006. Interactive Clustering for Data Exploration. Stanford University, Stanford, CA."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1089\/106652799318274"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-020-09753-4"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3446770","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3446770","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:31Z","timestamp":1750193251000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3446770"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,9]]},"references-count":29,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7,31]]}},"alternative-id":["10.1145\/3446770"],"URL":"https:\/\/doi.org\/10.1145\/3446770","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,9]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}