{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T06:42:49Z","timestamp":1780468969333,"version":"3.54.1"},"reference-count":27,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T00:00:00Z","timestamp":1749168000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Covenant University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>The increasing volume of student assessments, particularly open-ended responses, presents a significant challenge for educators in ensuring grading accuracy, consistency, and efficiency. This paper presents a structured dataset designed for the development and evaluation of automated grading systems in higher education. The primary objective is to create a high-quality dataset that facilitates the development and evaluation of natural language processing (NLP) models for automated grading. The dataset comprises student responses to open-ended questions from the Management Information Systems (MIS221) and Project Management (MIS415) courses at Covenant University, collected during the 2022\/2023 academic session. The responses were originally handwritten, scanned, and transcribed into Word documents. Each response is paired with corresponding scores assigned by human graders, following a detailed marking guide. To assess the dataset\u2019s potential for automated grading applications, several machine learning and transformer-based models were tested, including TF-IDF with Linear Regression, TF-IDF with Cosine Similarity, BERT, SBERT, RoBERTa, and Longformer. The experimental results demonstrate that transformer-based models outperform traditional methods, with Longformer achieving the highest Spearman\u2019s Correlation of 0.77 and the lowest Mean Squared Error (MSE) of 0.04, indicating a strong alignment between model predictions and human grading. The findings highlight the effectiveness of deep learning models in capturing the semantic and contextual meaning of both student responses and marking guides, making it possible to develop more scalable and reliable automated grading solutions. This dataset offers valuable insights into student performance and serves as a foundational resource for integrating educational technology into automated assessment systems. Future work will focus on enhancing grading consistency and expanding the dataset for broader academic applications.<\/jats:p>","DOI":"10.3390\/data10060087","type":"journal-article","created":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T03:52:28Z","timestamp":1749181948000},"page":"87","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A Structured Dataset for Automated Grading: From Raw Data to Processed Dataset"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3336-5332","authenticated-orcid":false,"given":"Ibidapo Dare","family":"Dada","sequence":"first","affiliation":[{"name":"Department of Computer and Information Science, Covenant University, P.M.B. 1023, Ota 112104, Ogun State, Nigeria"},{"name":"Department of Computer Science, Federal University of Agriculture, P.M.B. 2240, Abeokuta 111101, Ogun State, Nigeria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Adio T.","family":"Akinwale","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Federal University of Agriculture, P.M.B. 2240, Abeokuta 111101, Ogun State, Nigeria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2231-1201","authenticated-orcid":false,"given":"Ti-Jesu","family":"Tunde-Adeleke","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Science, Covenant University, P.M.B. 1023, Ota 112104, Ogun State, Nigeria"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"S\u00fczen, N., Gorban, A.N., Levesley, J., and Mirkes, E.M. (2020). Automatic short answer grading and feedback using text mining methods. Procedia Computer Science, Elsevier B.V.","DOI":"10.1016\/j.procs.2020.02.171"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"108486","DOI":"10.1109\/ACCESS.2019.2933354","article-title":"Syntactic, semantic and sentiment analysis: The joint effect on automated essay evaluation","volume":"7","author":"Janda","year":"2019","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.3233\/IDA-184254","article-title":"Evaluation of data analytics based clustering algorithms for knowledge mining in a student engagement data","volume":"23","author":"Oladipupo","year":"2019","journal-title":"Intell. Data Anal."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"10315","DOI":"10.1109\/ACCESS.2024.3352440","article-title":"A Learning Analytic Approach to Modelling Student-Staff Interaction From Students\u2019 Perception of Engagement Practices","volume":"12","author":"Oladipupo","year":"2024","journal-title":"IEEE Access."},{"key":"ref_5","first-page":"85","article-title":"On deep learning approaches to automated assessment: Strategies for short answer grading","volume":"2","author":"Ahmed","year":"2022","journal-title":"CSEDU"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lagakis, P., and Demetriadis, S. (2021, January 11\u201313). Automated essay scoring: A review of the field. Proceedings of the 2021 International Conference on Computer, Information and Telecommunication Systems (CITS), Istanbul, Turkey.","DOI":"10.1109\/CITS52676.2021.9618476"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wu, Y., Henriksson, A., Nouri, J., Duneld, M., and Li, X. (2023). Beyond Benchmarks: Spotting Key Topical Sentences While Improving Automated Essay Scoring Performance with Topic-Aware BERT. Electronics, 12.","DOI":"10.3390\/electronics12010150"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"100184","DOI":"10.1016\/j.caeo.2024.100184","article-title":"Combining human and artificial intelligence for enhanced AI literacy in higher education","volume":"6","author":"Tzirides","year":"2024","journal-title":"Comput. Educ. Open"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Garg, J., Papreja, J., Apurva, K., and Jain, G. (2022, January 24\u201326). Domain-Specific Hybrid BERT based System for Automatic Short Answer Grading. Proceedings of the 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India.","DOI":"10.1109\/CONIT55038.2022.9847754"},{"key":"ref_10","unstructured":"Dzikovska, M.O., Nielsen, R.D., Brew, C., Leacock, C., Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I., and Dang, H.T. (2013, January 14\u201315). SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, GA, USA."},{"key":"ref_11","unstructured":"Mohler, M., Bunescu, R., and Mihalcea, R. (2011, January 19\u201324). Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_12","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sung, C., Saha, S., Ma, T., Reddy, V., and Arora, R. (2019, January 3\u20137). Pre-training BERT on domain resources for short answer grading. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1628"},{"key":"ref_14","unstructured":"Condor, A., Litster, M., and Pardos, Z. (July, January 29). Automatic Short Answer Grading with SBERT on out-of-Sample Questions. Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), Paris, France."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Alikaniotis, D., Yannakoudakis, H., and Rei, M. (2016). Automatic Text Scoring Using Neural Networks. arXiv.","DOI":"10.18653\/v1\/P16-1068"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lei, W., and Meng, Z. (2022, January 25\u201327). Text similarity calculation method of Siamese network based on ALBERT. Proceedings of the 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China.","DOI":"10.1109\/MLKE55170.2022.00055"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1109\/TLT.2022.3175537","article-title":"Automatic Short-Answer Grading via BERT-Based Deep Neural Networks","volume":"15","author":"Zhu","year":"2022","journal-title":"IEEE Trans. Learn. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sayeed, M.A., and Gupta, D. (2022, January 14\u201316). Automate Descriptive Answer Grading using Reference based Models. Proceedings of the 2022 OITS International Conference on Information Technology (OCIT), Bhubaneswar, India.","DOI":"10.1109\/OCIT56763.2022.00057"},{"key":"ref_19","unstructured":"Ouahrani, L., and Bennouar, D. (2020, January 11\u201316). AR-ASAG An ARabic Dataset for Automatic Short Answer Grading Evaluation. Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"39457","DOI":"10.1109\/ACCESS.2023.3267407","article-title":"Automatic Arabic Grading System for Short Answer Questions","volume":"11","author":"Badry","year":"2023","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Salam, M.A., El-Fatah, M.A., and Hassan, N.F. (2022). Automatic grading for Arabic short answer questions using optimized deep learning model. PLoS ONE, 17.","DOI":"10.1371\/journal.pone.0272269"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"100109","DOI":"10.1016\/j.array.2021.100109","article-title":"AraScore: A deep learning-based system for Arabic short answer scoring","volume":"13","author":"Nael","year":"2022","journal-title":"Array"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/s00778-019-00564-x","article-title":"Dataset search: A survey","volume":"29","author":"Chapman","year":"2020","journal-title":"VLDB J."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1016\/j.csl.2013.10.005","article-title":"Automatic scoring for answers to Arabic test questions","volume":"28","author":"Gomaa","year":"2014","journal-title":"Comput. Speech Lang."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Oyelade, J., Isewon, I., Oladipupo, O., Emebo, O., Aromolaran, O., Uwoghiren, E., Olaniyan, D., and Olawole, O. (2019, January 1\u20134). Data Clustering: Algorithms and Its Applications. Proceedings of the19th International Conference on Computational Science and Its Applications (ICCSA 2019), St. Petersburg, Russia.","DOI":"10.1109\/ICCSA.2019.000-1"},{"key":"ref_26","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_27","unstructured":"Beltagy, I., Peters, M.E., and Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/6\/87\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:47:23Z","timestamp":1760032043000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/6\/87"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,6]]},"references-count":27,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["data10060087"],"URL":"https:\/\/doi.org\/10.3390\/data10060087","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,6]]}}}