{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T11:34:16Z","timestamp":1773747256491,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T00:00:00Z","timestamp":1773619200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deanship of Scientific Research, Islamic University of Madinah, Saudi Arabia"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Automated Short Answer Grading (ASAG) has garnered significant attention in the field of educational technology due to its potential to improve the efficiency, scalability, and consistency of student assessments. This study introduces a novel dataset of 651 student responses from a Database Transaction course exam at Beni-Suef University, referred to as the Beni-Suef Transaction Processing (BeSTraP) dataset. The BeSTraP is specifically designed to support ASAG evaluation. To assess ASAG performance, five approaches were employed: string-based similarity, semantic similarity, a hybrid of both, fine-tuning transformer-based models, and the application of Large Language Models (LLMs). The experimental results indicated that fine-tuned transformers, particularly GPT-2, achieved the highest Pearson correlation with human scores (0.8813) on the new dataset and maintained robust performance on the Mohler benchmark (0.7834). In addition to grading, the framework integrates automated feedback generation through LLMs, further enriching the assessment process. This research contributes (i) a novel, domain-specific dataset derived from an actual university examination, (ii) a comprehensive comparison of traditional and transformer-based approaches, and (iii) evidence of the efficacy of fine-tuned models in providing accurate and scalable grading solutions. The created dataset will be publicly available for the community.<\/jats:p>","DOI":"10.3390\/data11030057","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T10:17:44Z","timestamp":1773742664000},"page":"57","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Leveraging Transformers and LLMs for Automated Grading and Feedback Generation Using a Novel Dataset"],"prefix":"10.3390","volume":"11","author":[{"given":"Asmaa","family":"G. Khalf","sequence":"first","affiliation":[{"name":"Faculty of Computers and Artificial Intelligence, Beni-Suef University, Beni-Suef 62511, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3729-5079","authenticated-orcid":false,"given":"Emad","family":"Nabil","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6847-2632","authenticated-orcid":false,"given":"Wael","family":"H. Gomaa","sequence":"additional","affiliation":[{"name":"Faculty of Computers and Artificial Intelligence, Beni-Suef University, Beni-Suef 62511, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4540-3650","authenticated-orcid":false,"given":"Oussama","family":"Benrhouma","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia"}]},{"given":"Amira","family":"M. El-Mandouh","sequence":"additional","affiliation":[{"name":"Faculty of Computers and Artificial Intelligence, Beni-Suef University, Beni-Suef 62511, Egypt"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1504\/IJTEL.2019.096734","article-title":"A systemic approach to leveraging student engagement in collaborative learning to improve online engineering education","volume":"11","author":"Qiu","year":"2019","journal-title":"Int. J. Technol. Enhanc. Learn."},{"key":"ref_2","first-page":"617","article-title":"Progress and challenges for automated scoring and feedback systems for large-scale assessments","volume":"2","author":"Whitelock","year":"2018","journal-title":"Int. Handb. Prim. Second. Educ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1007\/s10639-018-9797-0","article-title":"Online assessments: Exploring perspectives of university students","volume":"24","author":"Khan","year":"2019","journal-title":"Educ. Inf. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"87","DOI":"10.11120\/ndir.2013.00009","article-title":"E-assessment: Past, present and future","volume":"9","author":"Jordan","year":"2013","journal-title":"New Dir."},{"key":"ref_5","unstructured":"Ashton, H.S., Beevers, C.E., Milligan, C.D., Schofield, D.K., Thomas, R.C., and Youngson, M.A. (2006). Moving beyond objective testing in online assessment. Online Assessment and Measurement: Case Studies from Higher Education, K-12 and Corporate, IGI Global."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1080\/03075079.2019.1654450","article-title":"Self-regulation in open-ended online assignment tasks: The importance of initial task interpretation and goal setting","volume":"46","author":"Beckman","year":"2021","journal-title":"Stud. High. Educ."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1016\/j.cogsys.2019.09.025","article-title":"Automatic grading and hinting in open-ended text questions","volume":"59","author":"Sychev","year":"2020","journal-title":"Cogn. Syst. Res."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ahmed, B., Kagita, M., Wijenayake, C.A., and Ravishankar, J. (2018, January 4\u20137). Implementation guidelines for an automated grading tool to assess short answer questions on digital circuit design course. Proceedings of the 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), Wollongong, NSW, Australia.","DOI":"10.1109\/TALE.2018.8615228"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1007\/s40593-022-00289-z","article-title":"Towards trustworthy autograding of short, multi-lingual, multi-type answers","volume":"33","author":"Schneider","year":"2023","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Riezler, S., Simianer, P., and Haas, C. (2014, January 22\u201327). Response-based learning for grounded machine translation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-1083"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sayeed, M.A., and Gupta, D. (2022, January 14\u201316). Automate Descriptive Answer Grading using Reference based Models. Proceedings of the 2022 OITS International Conference on Information Technology (OCIT), Bhubaneswar, India.","DOI":"10.1109\/OCIT56763.2022.00057"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1007\/s40593-023-00391-w","article-title":"Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading","volume":"34","author":"Ouahrani","year":"2024","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"96332","DOI":"10.1109\/ACCESS.2024.3420890","article-title":"A Hybrid Approach for Automated Short Answer Grading","volume":"12","author":"Kaya","year":"2024","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"23173","DOI":"10.1609\/aaai.v38i21.30363","article-title":"Automatic short answer grading for finnish with chatgpt","volume":"Volume 38","author":"Chang","year":"2024","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1007\/s42979-024-02954-7","article-title":"Empowering Educators: Automated Short Answer Grading with Inconsistency Check and Feedback Integration using Machine Learning","volume":"5","author":"Simha","year":"2024","journal-title":"SN Comput. Sci."},{"key":"ref_16","unstructured":"Klein, M., Krupka, D., Winter, C., Gergeleit, M., and Marti, L. (2024). Computer-Assisted Short Answer Grading Using Large Language Models and Rubrics. Proceedings of the Informatik 2024, Wiesbaden, Germany, 24\u201326 September 2024, Gesellschaft f\u00fcr Informatik eV."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s42979-023-01682-8","article-title":"Novel framework for improving the correctness of reference answers to enhance results of ASAG systems","volume":"4","year":"2023","journal-title":"SN Comput. Sci."},{"key":"ref_18","first-page":"113","article-title":"Automatic essay scoring for Arabic short answer questions using text mining techniques","volume":"14","author":"Meccawy","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_19","first-page":"719","article-title":"Evaluation of Short Answers Using Domain Specific Embedding and Siamese Stacked BiLSTM with Contrastive Loss","volume":"37","author":"Patil","year":"2023","journal-title":"Rev. D\u2019Intell. Artif."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Jiang, L., and Bosch, N. (2024, January 18\u201320). Short answer scoring with GPT-4. Proceedings of the Eleventh ACM Conference on Learning@ Scale, Atlanta, GA, USA.","DOI":"10.1145\/3657604.3664685"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"6025","DOI":"10.24996\/ijs.2023.64.11.44","article-title":"Automatic Short Answer Grading System Based on Semantic Networks and Support Vector Machine","volume":"64","author":"Hameed","year":"2023","journal-title":"Iraqi J. Sci."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gomaa, W.H., and Fahmy, A.A. (2020). Ans2vec: A scoring system for short answers. Proceedings of the The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019) 4, Springer.","DOI":"10.1007\/978-3-030-14118-9_59"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Sawatzki, J., Schlippe, T., and Benner-Wickner, M. (2021). Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. Proceedings of the International Conference on Artificial Intelligence in Education Technology, Springer.","DOI":"10.1007\/978-981-16-7527-0_5"},{"key":"ref_24","unstructured":"Gaddipati, S.K., Nair, D., and Pl\u00f6ger, P.G. (2020). Comparative evaluation of pretrained transfer learning models on automatic short answer grading. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Prabhudesai, A., and Duong, T.N. (2019, January 10\u201313). Automatic short answer grading using Siamese bidirectional LSTM based regression. Proceedings of the 2019 IEEE International Conference on Engineering, Technology and Education (TALE), Yogyakarta, Indonesia.","DOI":"10.1109\/TALE48000.2019.9226026"},{"key":"ref_26","first-page":"397","article-title":"Automatic short answer scoring based on paragraph embeddings","volume":"9","author":"Hassan","year":"2018","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kumar, S., Chakrabarti, S., and Roy, S. (2017, January 19\u201325). Earth Mover\u2019s Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia.","DOI":"10.24963\/ijcai.2017\/284"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Saeed, M.M., and Gomaa, W.H. (2022, January 8\u20139). An ensemble-based model to improve the accuracy of automatic short answer grading. Proceedings of the 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt.","DOI":"10.1109\/MIUCC55081.2022.9781737"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Dada, I.D., Akinwale, A.T., and Tunde-Adeleke, T.J. (2025). A Structured Dataset for Automated Grading: From Raw Data to Processed Dataset. Data, 10.","DOI":"10.3390\/data10060087"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wijanto, M.C., and Yong, H.S. (2024). Combining balancing dataset and sentencetransformers to improve short answer grading performance. Appl. Sci., 14.","DOI":"10.3390\/app14114532"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Abdul Salam, M., El-Fatah, M.A., and Hassan, N.F. (2022). Automatic grading for Arabic short answer questions using optimized deep learning model. PLoS ONE, 17.","DOI":"10.1371\/journal.pone.0272269"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1871","DOI":"10.1007\/s41060-024-00576-z","article-title":"SPRAG: Building and benchmarking a Short Programming-Related Answer Grading dataset","volume":"20","author":"Bonthu","year":"2024","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4295","DOI":"10.1007\/s10115-023-01892-9","article-title":"GradeAid: A framework for automatic short answers grading in educational contexts\u2014Design, implementation and evaluation","volume":"65","author":"Guarino","year":"2023","journal-title":"Knowl. Inf. Syst."},{"key":"ref_34","first-page":"115","article-title":"Short answer grading using string similarity and corpus-based similarity","volume":"3","author":"Gomaa","year":"2012","journal-title":"Int. J. Adv. Comput. Sci. Appl. (IJACSA)"},{"key":"ref_35","unstructured":"Little, C.C. (2025, March 29). Abydos: A Python Library for Text Processing. Available online: https:\/\/abydos.readthedocs.io\/en\/latest\/index.html."},{"key":"ref_36","first-page":"13","article-title":"A survey of text similarity approaches","volume":"68","author":"Gomaa","year":"2013","journal-title":"Int. J. Comput. Appl."},{"key":"ref_37","unstructured":"Sung, C., Dhamecha, T.I., and Mukhi, N. (2019). Improving short answer grading using transformer-based pre-training. Proceedings of the Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, 25\u201329 June 2019, Proceedings, Part I 20, Springer."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/s44163-024-00147-y","article-title":"Performance of the pre-trained large language model GPT-4 on automated short answer grading","volume":"4","author":"Kortemeyer","year":"2024","journal-title":"Discov. Artif. Intell."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1007\/s40593-024-00403-3","article-title":"GPT-4 in education: Evaluating aptness, reliability, and loss of coherence in solving calculus problems and grading submissions","volume":"35","author":"Gandolfi","year":"2024","journal-title":"Int. J. Artif. Intell. Educ."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/11\/3\/57\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T10:49:10Z","timestamp":1773744550000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/11\/3\/57"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,16]]},"references-count":39,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["data11030057"],"URL":"https:\/\/doi.org\/10.3390\/data11030057","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,16]]}}}