{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T15:54:59Z","timestamp":1781020499223,"version":"3.54.1"},"reference-count":42,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T00:00:00Z","timestamp":1758240000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deanship of Scientific Research, Islamic University of Madinah, Madinah, Saudi Arabia"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Automated essay scoring (AES) has become an essential tool in educational assessment. However, applying AES to the Arabic language presents notable challenges, primarily due to the lack of labeled datasets. This data scarcity hampers the development of reliable machine learning models and slows progress in Arabic natural language processing for educational use. While manual annotation by human experts remains the most accurate method for essay evaluation, it is often too costly and time-consuming to create large-scale datasets, especially for low-resource languages like Arabic. In this work, we introduce a human\u2013AI collaborative framework designed to overcome the shortage of scored Arabic essays. Leveraging QAES, a high-quality annotated dataset, our approach uses Large Language Models (LLMs) to generate multidimensional essay evaluations across seven key writing traits: Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Structure. To ensure accuracy and consistency, we design prompting strategies and validation procedures tailored to each trait. This system is then applied to two unannotated Arabic essay datasets: ZAEBUC and QALB. As a result, we introduce ZaQQ, a newly annotated dataset that merges ZAEBUC, QAES, and QALB. Our findings demonstrate that human\u2013AI collaboration can significantly enhance the availability of labeled resources without compromising assessment quality. The proposed framework serves as a scalable and replicable model for addressing data annotation challenges in low-resource languages and supports the broader goal of expanding access to automated educational assessment tools where expert evaluation is limited.<\/jats:p>","DOI":"10.3390\/data10090148","type":"journal-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T10:08:58Z","timestamp":1758276538000},"page":"148","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["ZaQQ: A New Arabic Dataset for Automatic Essay Scoring via a Novel Human\u2013AI Collaborative Framework"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-7565-3172","authenticated-orcid":false,"given":"Yomna","family":"Elsayed","sequence":"first","affiliation":[{"name":"Computer and Systems Engineering Department, Alexandria University, Alexandria 21526, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3729-5079","authenticated-orcid":false,"given":"Emad","family":"Nabil","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6149-1718","authenticated-orcid":false,"given":"Marwan","family":"Torki","sequence":"additional","affiliation":[{"name":"Computer and Systems Engineering Department, Alexandria University, Alexandria 21526, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0367-7370","authenticated-orcid":false,"given":"Safiullah","family":"Faizullah","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6570-7455","authenticated-orcid":false,"given":"Ayman","family":"Khalafallah","sequence":"additional","affiliation":[{"name":"Computer and Systems Engineering Department, Alexandria University, Alexandria 21526, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,19]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"Automated Essay Scoring for Classroom Assessment","volume":"4","author":"Attali","year":"2006","journal-title":"J. Technol. Learn. Assess."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Shermis, M.D., and Burstein, J.C. (2003). Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates Publishers.","DOI":"10.4324\/9781410606860"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Habash, N.Y. (2010). Introduction to Arabic Natural Language Processing, Morgan & Claypool Publishers. [1st ed.].","DOI":"10.1007\/978-3-031-02139-8"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.edurev.2007.05.002","article-title":"The use of scoring rubrics: Reliability, validity and educational consequences","volume":"2","author":"Jonsson","year":"2007","journal-title":"Educ. Res. Rev."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.asw.2014.05.001","article-title":"When \u201cthe state of the art\u201d is counting words","volume":"21","author":"Perelman","year":"2014","journal-title":"Assess. Writ."},{"key":"ref_6","first-page":"1877","article-title":"Language Models are Few-Shot Learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ghazawi, R., and Simpson, E. (2025). How well can LLMs Grade Essays in Arabic?. arXiv.","DOI":"10.1016\/j.caeai.2025.100449"},{"key":"ref_8","unstructured":"Zaidan, O., and Callison-Burch, C. (2011, January 19\u201324). The Arabic Online Commentary Dataset. Proceedings of the ACL, Portland, OR, USA."},{"key":"ref_9","unstructured":"Mathias, S., and Bhattacharyya, P. (2018, January 7\u201312). ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_10","unstructured":"Hamner, B., Morgan, J., Shermis, M., and Vander Ark, T. (2024, September 17). The Hewlett Foundation: Automated Essay Scoring. Kaggle. Available online: https:\/\/kaggle.com\/competitions\/asap-aes."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., and Chodorow, M. (2013). TOEFL11: A Corpus of Non-Native English, Educational Testing Service. Report No. RR-13-25.","DOI":"10.1002\/j.2333-8504.2013.tb02331.x"},{"key":"ref_12","unstructured":"Ahmed, A.M., Myhill, D., Abdollahzadeh, E., McCallum, L., Zaghouani, W., Rezk, L., Jrad, A., and Zhang, X. (2022). Qatari Corpus of Argumentative Writing LDC2022T04, Web Download; Linguistic Data Consortium."},{"key":"ref_13","unstructured":"Zaghouani, W., Ahmed, A., Zhang, X., and Rezk, L. (2024, January 20\u201325). QCAW 1.0: Building a Qatari Corpus of Student Argumentative Writing. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Bashendy, M., Albatarni, S., Eltanbouly, S., Zahran, E., Elhuseyin, H., Elsayed, T., Massoud, W., and Bouamor, H. (2024, January 16). QAES: First Publicly-Available Trait-Specific Annotations for Automated Scoring of Arabic Essays. Proceedings of the ARABICNLP, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.arabicnlp-1.28"},{"key":"ref_15","unstructured":"Habash, N., and Palfreyman, D.M. (2022, January 20\u201325). ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus. Proceedings of the International Conference on Language Resources and Evaluation, Marseille, France."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mohit, B., Rozovskaya, A., Habash, N., Zaghouani, W., and Obeid, O. (2014, January 25). The First QALB Shared Task on Automatic Text Correction for Arabic. Proceedings of the First Workshop on Arabic Natural Language Processing, Doha, Qatar.","DOI":"10.3115\/v1\/W14-3605"},{"key":"ref_17","unstructured":"Ghazawi, R., and Simpson, E. (2024). Automated essay scoring in Arabic: A dataset and analysis of a BERT-based system. arXiv."},{"key":"ref_18","unstructured":"Alfarah, Z., Habash, N., and Saddiki, H. (2021, January 19). ARAScore: Holistic and Analytic Scoring for Arabic Essays. Proceedings of the WANLP, Kyiv, Ukraine."},{"key":"ref_19","unstructured":"Ouahrani, L., and Bennouar, D. (2020, January 11\u201316). AR-ASAG: An Arabic Dataset for Automatic Short Answer Grading. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Alhafni, B., Inoue, G., Khairallah, C., and Habash, N. (2023, January 6\u201310). Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.396"},{"key":"ref_21","unstructured":"Inoue, G., Alhafni, B., Baimukan, N., Bouamor, H., and Habash, N. (2021, January 19). The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models. Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine."},{"key":"ref_22","unstructured":"Antoun, W., Baly, F., and Hajj, H. (2020, January 11\u201316). AraBERT: Transformer-based Model for Arabic Language Understanding. Proceedings of the LREC 2020 Workshop Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1736","DOI":"10.1016\/j.ipm.2019.05.008","article-title":"AAEE\u2013Automated evaluation of students\u2019 essays in Arabic language","volume":"56","author":"Azmi","year":"2019","journal-title":"Inf. Process. Manag."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1093\/logcom\/exs070","article-title":"A Corpus-Based Finite-State Morphological Toolkit for Contemporary Arabic","volume":"24","author":"Attia","year":"2013","journal-title":"J. Log. Comput."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Taghipour, K., and Ng, H.T. (2016, January 1\u20135). A Neural Approach to Automated Essay Scoring. Proceedings of the EMNLP, Austin, TX, USA.","DOI":"10.18653\/v1\/D16-1193"},{"key":"ref_26","first-page":"1","article-title":"Automatic Text Scoring Using Neural Networks","volume":"45","author":"Alikaniotis","year":"2019","journal-title":"Comput. Linguist."},{"key":"ref_27","first-page":"1","article-title":"ARBERT: Effective Arabic Tokenization","volume":"48","author":"Elmadany","year":"2022","journal-title":"Comput. Linguist."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"142555","DOI":"10.1109\/ACCESS.2024.3470728","article-title":"Automatic Scoring of Arabic Essays: A Parameter-Efficient Approach for Grammatical Assessment","volume":"12","author":"Mahmoud","year":"2024","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Abdelrehim, M., Torki, M., and El-Makky, N. (2025, January 10\u201312). Hybrid LLM and Rule-Based Synthetic Data Generation for Arabic Grammatical Error Correction. Proceedings of the 2nd IEEE International Conference on Machine Intelligence and Smart Innovation (ICMISI 2025), Alexandria, Egypt.","DOI":"10.1109\/ICMISI65108.2025.11115884"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Qwaider, C., Alhafni, B., Chirkunov, K., Habash, N., and Briscoe, T. (2025). Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection. arXiv.","DOI":"10.18653\/v1\/2025.bea-1.40"},{"key":"ref_31","unstructured":"Afrizal Doewes, A., Kurdhi, N.A., and Saxena, A. (2023, January 11\u201314). Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring. Proceedings of the 16th International Conference on Educational Data Mining, Bengaluru, India."},{"key":"ref_32","unstructured":"Laurer, M., van Atteveldt, W., Casas, A., and Welbers, K. (2023). Building Efficient Universal Classifiers with Natural Language Inference. arXiv."},{"key":"ref_33","unstructured":"Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T.T., and Moazam, H. (2023). DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy Function Approximation: A Gradient Boosting Machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely Randomized Trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn."},{"key":"ref_37","first-page":"6638","article-title":"CatBoost: Unbiased Boosting with Categorical Features","volume":"31","author":"Prokhorenkova","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hosmer, D.W., and Lemeshow, S. (2000). Applied Logistic Regression, Wiley. [2nd ed.].","DOI":"10.1002\/0471722146"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-Vector Networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_40","unstructured":"Zhang, H. (2004, January 12\u201314). The Optimality of Naive Bayes. Proceedings of the 17th International FLAIRS Conference, Miami Beach, FL, USA."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning Representations by Back-propagating Errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"ref_42","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/9\/148\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:48:54Z","timestamp":1760035734000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/9\/148"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,19]]},"references-count":42,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["data10090148"],"URL":"https:\/\/doi.org\/10.3390\/data10090148","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,19]]}}}