{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T14:03:14Z","timestamp":1774015394558,"version":"3.50.1"},"reference-count":29,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T00:00:00Z","timestamp":1773964800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Developing reliable Arabic question answering (QA) systems for Islamic fatwas requires datasets that capture the linguistic complexity and multi-step reasoning inherent in jurisprudential inquiries. However, the existing Arabic religious QA datasets primarily focus on direct retrieval or classification, often failing to address the multi-hop reasoning necessary for complex fatwa questions. To bridge this gap, we introduce MAFQA, a benchmark dataset specifically designed for multi-hop Arabic fatwa question answering. MAFQA was constructed from an extensive corpus of authentic fatwa records sourced from authoritative Islamic institutions. The dataset was developed via a semi-automated pipeline that integrates expert-guided identification of complex inquiries with a structured decomposition framework. This framework employs automated reasoning-pattern classification, semantic feature extraction, and template-guided annotation of subquestions and subanswers, followed by rigorous validation to ensure contextual grounding, logical coherence, and structural consistency. To evaluate the utility of the dataset, we conduct an extensive benchmarking study using Arabic-specialized, multilingual, and instruction-tuned language models across two primary tasks: question decomposition (QD) and generative question answering (QA). Performance is assessed using a comprehensive suite of lexical, semantic, relevance, and faithfulness metrics. Experimental results demonstrate that Arabic-specialized models consistently outperform their multilingual counterparts, with AraT5-base and AraBART achieving the highest performance in terms of lexical similarity, semantic alignment, and answer faithfulness.<\/jats:p>","DOI":"10.3390\/data11030064","type":"journal-article","created":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T11:44:46Z","timestamp":1774007086000},"page":"64","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MAFQA: A Dataset for Benchmarking Multi-Hop Arabic Fatwa Question Answering"],"prefix":"10.3390","volume":"11","author":[{"given":"Manal Ali","family":"Al-Qahtani","sequence":"first","affiliation":[{"name":"Department of Information Systems, College of Computer and Information Sciences, King Saud University, Riyadh 12372, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7479-7102","authenticated-orcid":false,"given":"Bader Fahad","family":"Alkhamees","sequence":"additional","affiliation":[{"name":"Department of Information Systems, College of Computer and Information Sciences, King Saud University, Riyadh 12372, Saudi Arabia"}]},{"given":"Mourad","family":"Ykhlef","sequence":"additional","affiliation":[{"name":"Department of Information Systems, College of Computer and Information Sciences, King Saud University, Riyadh 12372, Saudi Arabia"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,20]]},"reference":[{"key":"ref_1","unstructured":"Al-Yahya, M. (2018, January 18\u201324). Towards automated fiqh school authorship attribution. Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mozannar, H., Maamouri, M., El-Haj, M., and Habash, N. (2019, January 1). Neural Arabic question answering. Proceedings of the 4th Arabic Natural Language Processing Workshop, Florence, Italy.","DOI":"10.18653\/v1\/W19-4612"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1162\/tacl_a_00416","article-title":"MasakhaNER: Named entity recognition for African languages","volume":"9","author":"Adelani","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Artetxe, M., Ruder, S., and Yogatama, D. (2020, January 5\u201310). On the cross-lingual transferability of monolingual representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.421"},{"key":"ref_5","unstructured":"Malhas, R., Mansour, W., and Elsayed, T. (2022, January 17). Qur\u2019an QA 2022: Overview of the first shared task on question answering over the Holy Qur\u2019an. Proceedings of the Qur\u2019an QA Workshop, Gyeongju, Republic of Korea."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3400396","article-title":"AyaTEC: Building a reusable verse-based test collection for Arabic question answering on the Holy Qur\u2019an","volume":"19","author":"Malhas","year":"2020","journal-title":"ACM Trans. Asian Low-Resour. Lang. Inf. Process."},{"key":"ref_7","unstructured":"Alnefaie, S., Atwell, E., and Alsalka, M.A. (2023, January 4\u20136). HAQA and QUQA: Constructing two Arabic question-answering corpora for the Quran and Hadith. Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, Varna, Bulgaria."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/s44443-025-00128-w","article-title":"Hajj-FQA: A benchmark Arabic dataset for developing question-answering systems on Hajj fatwas","volume":"37","author":"Aleid","year":"2025","journal-title":"J. King Saud Univ. Comput. Inf. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Alyemny, O., Al-Khalifa, H., and Mirza, A. (2023). A data-driven exploration of a new Islamic fatwas dataset for Arabic NLP tasks. Data, 8.","DOI":"10.3390\/data8100155"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"118","DOI":"10.47760\/ijcsmc.2021.v10i04.017","article-title":"Towards an automated Islamic fatwa system: Survey, dataset and benchmarks","volume":"10","author":"Munshi","year":"2021","journal-title":"Int. J. Comput. Sci. Mobile Comput."},{"key":"ref_11","unstructured":"Sidhoum, A.H., Mataoui, M.H., Sebbak, F., and Sma\u00efli, K. (2022, January 27\u201328). ACQAD: A dataset for arabic complex question answering. Proceedings of the International Conference on Cyber Security, Artificial Inteligence and Theoretical Computer Science, Boumerdes, Algeria."},{"key":"ref_12","unstructured":"Ali, M.A., Daftardar, N., Waheed, M., Qin, J., and Wang, D. (2025, January 19\u201324). MQA-KEAL: Multi-hop question answering under knowledge editing for Arabic language. Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates."},{"key":"ref_13","unstructured":"Sen, P., Aji, A.F., and Saffari, A. (2022, January 12\u201317). Mintaka: A complex, natural, and multilingual dataset for end-to-end question answering. Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Saoudi, Y., and Gammoudi, M.M. (2023). A comprehensive review of arabic question answering datasets. Proceedings of the International Conference on Neural Information Processing, Springer Nature.","DOI":"10.1007\/978-981-99-8126-7_22"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., and Raffel, C. (2020). mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. arXiv, Available online: https:\/\/arxiv.org\/abs\/2010.11934.","DOI":"10.18653\/v1\/2021.naacl-main.41"},{"key":"ref_16","unstructured":"Hugging Face (2025, August 19). mT5-Base. Available online: https:\/\/huggingface.co\/google\/mt5-base."},{"key":"ref_17","unstructured":"Hugging Face (2025, August 19). AraBART. Available online: https:\/\/huggingface.co\/moussaKam\/AraBART."},{"key":"ref_18","unstructured":"Hugging Face (2025, August 19). AraT5-MSA-Base. Available online: https:\/\/huggingface.co\/UBC-NLP\/AraT5-msa-base."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Nagoudi, E.M.B., Elmadany, A., and Abdul-Mageed, M. (2021). AraT5: Text-to-Text Transformers for Arabic Language Generation. arXiv, Available online: https:\/\/arxiv.org\/abs\/2109.12068.","DOI":"10.18653\/v1\/2022.acl-long.47"},{"key":"ref_20","unstructured":"Hugging Face (2025, August 19). Arabic-T5-Small. Available online: https:\/\/huggingface.co\/flax-community\/arabic-t5-small."},{"key":"ref_21","unstructured":"Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., and Huang, F. (2023). Qwen Technical Report. arXiv."},{"key":"ref_22","unstructured":"Mistral AI (2026, March 07). Mistral-7B-Instruct-v0.2. Hugging Face Model Card. Available online: https:\/\/huggingface.co\/mistralai\/Mistral-7B-Instruct-v0.2."},{"key":"ref_23","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022, January 25\u201329). LoRA: Low-rank adaptation of large language models. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_24","unstructured":"Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., and Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. arXiv, Available online: https:\/\/arxiv.org\/abs\/1904.09675."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"111004","DOI":"10.1016\/j.dib.2024.111004","article-title":"Arabic paraphrased parallel synthetic dataset","volume":"57","year":"2024","journal-title":"Data Brief"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Khallaf, N., and Sharoff, S. (2022). Towards Arabic Sentence Simplification via Classification and Generative Approaches. arXiv, Available online: https:\/\/arxiv.org\/abs\/2204.09292.","DOI":"10.18653\/v1\/2022.wanlp-1.5"},{"key":"ref_27","unstructured":"Kmainasi, M.B., Shahroor, A.E., and Al-Ghraibah, A. (2025). Can Large Language Models Predict the Outcome of Judicial Decisions?. arXiv, Available online: https:\/\/arxiv.org\/abs\/2501.09768."},{"key":"ref_28","unstructured":"Chen, J., Li, J., Peng, Z., Wang, W., Ren, Y., Shi, L., and Hu, X. (2025). LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, Z., Liu, Y., Shi, L., Wang, Z.J., Chen, X., Zhao, Y., and Ren, F. (2025). MDEval: Evaluating and enhancing markdown awareness in large language models. Proceedings of the ACM Web Conference 2025, ACM.","DOI":"10.1145\/3696410.3714674"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/11\/3\/64\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T11:52:42Z","timestamp":1774007562000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/11\/3\/64"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,20]]},"references-count":29,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["data11030064"],"URL":"https:\/\/doi.org\/10.3390\/data11030064","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,20]]}}}