{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T17:05:21Z","timestamp":1770915921701,"version":"3.50.1"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2023,5,8]],"date-time":"2023-05-08T00:00:00Z","timestamp":1683504000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>Machine reading comprehension (MRC) requires machines to read and answer questions about a given text. This can be achieved through either predicting answers or extracting them. Extracting answers from text involves predicting the first and last index of the answer span within the paragraph. Training machines to answer questions requires datasets that are created for such a purpose. The lack of availability of benchmarking datasets for the Arabic language has hindered research into machine reading comprehension from Arabic text. The aim of this article is to propose an Arabic Span-Extraction-based Reading Comprehension Benchmark (ASER) and complement it with neural baseline models for performance evaluations. Detailed steps are depicted for building and evaluating ASER, which is an Arabic dataset created manually for the task of machine reading comprehension. It contains 10,000 records from different domains and is divided into training and testing sets. The results of ASER evaluation led to the conclusion that it is a challenging benchmark since the answers have varying lengths and human performance resulted in an exact match of 42%. On the other hand, two main baseline models were the focus of ASER experimentation: the sequence-to-sequence (Seq2Seq) model with different neural networks and the bidirectional attention flow (BIDAF) model. These experiments were implemented using different embeddings, and the results showed an exact match with lower values than human performance.<\/jats:p>","DOI":"10.1145\/3579047","type":"journal-article","created":{"date-parts":[[2023,1,10]],"date-time":"2023-01-10T12:15:12Z","timestamp":1673352912000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Arabic Span Extraction-based Reading Comprehension Benchmark (ASER) and Neural Baseline Models"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4386-0823","authenticated-orcid":false,"given":"Mariam M.","family":"Biltawi","sequence":"first","affiliation":[{"name":"School of Computing and Informatics, Al Hussein Technical University, Amman, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7067-5658","authenticated-orcid":false,"given":"Arafat","family":"Awajan","sequence":"additional","affiliation":[{"name":"Computer Science Department, Princess Sumaya University for Technology, Amman, Jordan and Computer Science Department, Mutah University, Karak, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0391-0160","authenticated-orcid":false,"given":"Sara","family":"Tedmori","sequence":"additional","affiliation":[{"name":"Computer Science Department, Princess Sumaya University for Technology, Amman, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,5,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-10674-4","volume-title":"Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering","author":"Abualigah Laith Mohammad Qasim","year":"2019","unstructured":"Laith Mohammad Qasim Abualigah. 2019. Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Springer."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2018.05.003"},{"key":"e_1_3_2_4_2","volume-title":"Proceedings of the 6th Arabic Natural Language Processing Workshop","author":"Albilali Eman","year":"2021","unstructured":"Eman Albilali, Nora Altwairesh, and Manar Hosny. 2021. What does BERT learn from Arabic machine reading comprehension datasets? In Proceedings of the 6th Arabic Natural Language Processing Workshop."},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the LREC 2020 Workshop Language Resources and Evaluation Conference","author":"Antoun Wissam","year":"2020","unstructured":"Wissam Antoun, Fady Baly, and Hazem Hajj. 2020. AraBERT: Transformer-based model for arabic language understanding. In Proceedings of the LREC 2020 Workshop Language Resources and Evaluation Conference."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3074950"},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the International Arab Conference on Information Technology","author":"Biltawi Mariam","year":"2017","unstructured":"Mariam Biltawi, Arafat Awajan, and Sara Tedmori. 2017. Towards building a frame-based ontology for the arabic language. In Proceedings of the International Arab Conference on Information Technology."},{"key":"e_1_3_2_8_2","volume-title":"Proceedings of the 35th IBlMA Conference","author":"Biltawi Mariam","year":"2020","unstructured":"Mariam Biltawi, Arafat Awajan, and Sara Tedmori. 2020. Towards building an open-domain corpus for arabic reading comprehension. In Proceedings of the 35th IBlMA Conference."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.5555\/AAI28114605"},{"key":"e_1_3_2_10_2","first-page":"2358","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics","author":"Chen Danqi","year":"2016","unstructured":"Danqi Chen, Jason Bolton, and Christopher D. Manning. 2016. A thorough examination of the CNN\/Daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2358\u20132367."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3501399"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"8784","DOI":"10.18653\/v1\/2021.emnlp-main.693","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Dzendzik Daria","year":"2021","unstructured":"Daria Dzendzik, Carl Vogel, and Jennifer Foster. 2021. English machine reading comprehension datasets: A survey. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8784\u20138804."},{"key":"e_1_3_2_14_2","volume-title":"Proceedings of the KDIR","author":"Eid Ahmad","year":"2019","unstructured":"Ahmad Eid, Nagwa M. El-Makky, and Khaled Nagi. 2019. Towards machine comprehension of arabic text. In Proceedings of the KDIR."},{"key":"e_1_3_2_15_2","unstructured":"Aysu Ezen-Can. 2020. A comparison of LSTM and BERT for small corpus. arXiv:2009.05451. Retrieved from https:\/\/arxiv.org\/abs\/2009.05451"},{"key":"e_1_3_2_16_2","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation","author":"Grave Edouard","year":"2018","unstructured":"Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation."},{"key":"e_1_3_2_17_2","first-page":"1","article-title":"I3rab: A new arabic dependency treebank based on arabic grammatical theory","author":"Halabi Dana","year":"2021","unstructured":"Dana Halabi, Ebaa Fayyoumi, and Arafat Awajan. 2021. I3rab: A new arabic dependency treebank based on arabic grammatical theory. Transactions on Asian and Low-resource Language Information Processing 21, 2 (2021), 1\u201332.","journal-title":"Transactions on Asian and Low-resource Language Information Processing"},{"key":"e_1_3_2_18_2","article-title":"Teaching machines to read and comprehend","author":"Moritz Hermann Karl","year":"2015","unstructured":"Karl Moritz Hermann, Tom\u00e1\u0161 Ko\u010disk\u00fd, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. Advances in Neural Information Processing Systems 28 (2015).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_20_2","volume-title":"Recurrent Neural Networks","author":"Hu Xiaolin","year":"2008","unstructured":"Xiaolin Hu and P. Balasubramaniam. 2008. (Eds.). Recurrent Neural Networks, Vol. 400. I-Tech Education and Publishing KG."},{"key":"e_1_3_2_21_2","first-page":"1","article-title":"Biomedical question answering: A survey of approaches and challenges","author":"Jin Qiao","year":"2022","unstructured":"Qiao Jin, Zheng Yuan, Guangzhi Xiong, Qianlan Yu, Huaiyuan Ying, Chuanqi Tan, Mosha Chen, Songfang Huang, Xiaozhong Liu, and Sheng Yu. 2022. Biomedical question answering: A survey of approaches and challenges. ACM Computing Surveys 55, 2 (2022), 1\u201336.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_2_22_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Lee Kenton","year":"2016","unstructured":"Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, and Jonathan Berant. 2016. Learning recurrent span representations for extractive question answering. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_23_2","article-title":"Distributed representations of words and phrases and their compositionality","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"108","DOI":"10.18653\/v1\/W19-4612","volume-title":"Proceedings of the 4rth Arabic Natural Language Processing Workshop","author":"Mozannar Hussein","year":"2019","unstructured":"Hussein Mozannar, Karl El Hajal, Elie Maamary, and Hazem Hajj. 2019. Neural arabic question answering. In Proceedings of the 4rth Arabic Natural Language Processing Workshop. 108\u2013118."},{"key":"e_1_3_2_25_2","article-title":"MS MARCO: A human generated machine reading comprehension dataset","author":"Nguyen Tri","year":"2016","unstructured":"Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. CoCo@ NIPS 2640 (2016), 660.","journal-title":"CoCo@ NIPS"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.3390\/math10030310"},{"key":"e_1_3_2_27_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence","author":"Pan Boyuan","year":"2017","unstructured":"Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, and Xiaofei He. 2017. MEMEN: Multi-layer embedding with memory networks for machine comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_3_2_29_2","article-title":"Bi-directional attention flow for machine comprehension","author":"Seo Minjoon","year":"2016","unstructured":"Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bi-directional attention flow for machine comprehension. International Conference on Learning Representations.","journal-title":"International Conference on Learning Representations"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2017.10.117"},{"key":"e_1_3_2_31_2","article-title":"Sequence to sequence learning with neural networks","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_32_2","first-page":"191","volume-title":"Proceedings of the 2nd Workshop on Representation Learning for NLP","author":"Trischler Adam","year":"2016","unstructured":"Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2016. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 191\u2013200."},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the Machine Translation Summit XVII. European Association for Machine Translation","author":"Vanmassenhove Eva","year":"2019","unstructured":"Eva Vanmassenhove, Dimitar Shterionov, and Andy Way. 2019. Lost in translation: Loss and decay of linguistic richness in machine translation. In Proceedings of the Machine Translation Summit XVII. European Association for Machine Translation."},{"key":"e_1_3_2_34_2","first-page":"1","volume-title":"Proceedings of the ICLR 2017: International Conference on Learning Representations. Toulon","author":"Wang Shuohang","year":"2017","unstructured":"Shuohang Wang and Jing Jiang. 2017. Machine comprehension using match-lstm and answer pointer. In Proceedings of the ICLR 2017: International Conference on Learning Representations. Toulon. 1\u201315."},{"key":"e_1_3_2_35_2","volume-title":"R-Net: machine reading comprehension with self-matching networks","author":"Wang W.","year":"2017","unstructured":"W. Wang. 2017. R-Net: machine reading comprehension with self-matching networks. Natural Language Computer Group, Microsoft Reserach."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1018"},{"key":"e_1_3_2_37_2","unstructured":"Zhiguo Wang Haitao Mi Wael Hamza and Radu Florian. 2016. Multi-Perspective context matching for machine comprehension. arXiv:1612.04211. Retrieved from https:\/\/arxiv.org\/abs\/1612.04211"},{"key":"e_1_3_2_38_2","first-page":"271","volume-title":"Proceedings of the 21st Conference on Computational Natural Language Learning","author":"Weissenborn Dirk","year":"2017","unstructured":"Dirk Weissenborn, Georg Wiese, and Laura Seiffe. 2017. Making neural QA as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning. 271\u2013280."},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"94","DOI":"10.18653\/v1\/W17-4413","volume-title":"Proceedings of the 3rd Workshop on Noisy User-generated Text","author":"Welbl Johannes","year":"2017","unstructured":"Johannes Welbl, Nelson F. Liu, and Matt Gardner. 2017. Crowdsourcing multiple choice science questions. In Proceedings of the 3rd Workshop on Noisy User-generated Text. 94\u2013106."},{"key":"e_1_3_2_40_2","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics","author":"Xie Pengtao","year":"2017","unstructured":"Pengtao Xie and Eric Xing. 2017. A constituent-centric neural architecture for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579047","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3579047","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:38:05Z","timestamp":1750178285000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579047"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,8]]},"references-count":39,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3579047"],"URL":"https:\/\/doi.org\/10.1145\/3579047","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,8]]},"assertion":[{"value":"2021-07-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-19","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}