{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T20:52:08Z","timestamp":1781729528778,"version":"3.54.5"},"reference-count":90,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T00:00:00Z","timestamp":1747699200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Applied Sciences"],"abstract":"<jats:p>This systematic review investigates 49 peer-reviewed studies on Large Language Model-Powered Automated Assessment (LLMPAA) published between 2018 and 2024. Following PRISMA guidelines, studies were selected from Web of Science, Scopus, IEEE, ACM Digital Library, and PubMed databases. The analysis shows that LLMPAA has been widely applied in reading comprehension, language education, and computer science, primarily using essay and short-answer formats. While models such as GPT-4 and fine-tuned BERT often exhibit high agreement with human raters (e.g., QWK = 0.99, r = 0.95), other studies report lower agreement (e.g., ICC = 0.45, r = 0.38). LLMPAA offers benefits like efficiency, scalability, and personalized feedback. However, significant challenges remain, including bias, inconsistency, hallucination, limited explainability, dataset quality, and privacy concerns. These findings indicate that while LLMPAA technologies hold promise, their effectiveness varies by context. Human oversight is essential to ensure fair and reliable assessment outcomes.<\/jats:p>","DOI":"10.3390\/app15105683","type":"journal-article","created":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T04:48:18Z","timestamp":1747716498000},"page":"5683","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":30,"title":["Large Language Model-Powered Automated Assessment: A Systematic Review"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3970-4406","authenticated-orcid":false,"given":"Emrah","family":"Emirtekin","sequence":"first","affiliation":[{"name":"Center for Distance Education Application and Research, Ege University, \u0130zmir 35040, Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,20]]},"reference":[{"key":"ref_1","first-page":"1066","article-title":"Improving Automatic Short Answer Scoring Task Through a Hybrid Deep Learning Framework","volume":"15","author":"Ikiss","year":"2024","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1177\/07356331231191174","article-title":"Who\u2019s the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers","volume":"61","author":"Urrutia","year":"2024","journal-title":"J. Educ. Comput. Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1016\/j.procs.2020.02.171","article-title":"Automatic Short Answer Grading and Feedback Using Text Mining Methods","volume":"169","author":"Gorban","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Yan, Y., and Liu, H. (2024). Ethical Framework for AI Education Based on Large Language Models. Educ. Inf. Technol., 1\u201319.","DOI":"10.1007\/s10639-024-13241-6"},{"key":"ref_5","first-page":"1","article-title":"An Overview of Automated Scoring of Essays","volume":"5","author":"Dikli","year":"2006","journal-title":"J. Technol. Learn. Assess."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1111\/emip.12537","article-title":"Using Active Learning Methods to Strategically Select Essays for Automated Scoring","volume":"42","author":"Firoozi","year":"2023","journal-title":"Educ. Meas. Issues Pract."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s40593-014-0026-8","article-title":"The Eras and Trends of Automatic Short Answer Grading","volume":"25","author":"Burrows","year":"2015","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Henkel, O., Hills, L., Roberts, B., and Mcgrane, J. (2024). Can LLMs Grade Open Response Reading Comprehension Questions? An Empirical Study Using the ROARs Dataset. Int. J. Artif. Intell. Educ., 1\u201326.","DOI":"10.1007\/s40593-024-00431-z"},{"key":"ref_9","first-page":"148","article-title":"Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015 Statement","volume":"20","author":"Moher","year":"2016","journal-title":"Rev. Esp. Nutr. Humana Diet."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Mendon\u00e7a, P.C., Quintal, F., and Mendon\u00e7a, F. (2025). Evaluating LLMs for Automated Scoring in Formative Assessments. Appl. Sci., 15.","DOI":"10.3390\/app15052787"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Thurzo, A. (2025). Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall. Electronics, 14.","DOI":"10.20944\/preprints202502.2232.v1"},{"key":"ref_12","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL HLT 2019\u20142019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies\u2014Proceedings of the Conference, Minneapolis, MN, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Arici, N., Gerevini, A.E., Olivato, M., Putelli, L., Sigalini, L., and Serina, I. (2023). Real-World Implementation and Integration of an Automatic Scoring System for Workplace Safety Courses in Italian. Future Internet, 15.","DOI":"10.3390\/fi15080268"},{"key":"ref_14","first-page":"395","article-title":"Optimization of AES Using BERT and BiLSTM for Grading the Online Exams","volume":"17","author":"Azhari","year":"2024","journal-title":"Int. J. Intell. Eng. Syst."},{"key":"ref_15","first-page":"204","article-title":"An Empirical Analysis of BERT Embedding for Automated Essay Scoring","volume":"11","author":"Beseiso","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bonthu, S., Sree, S.R., and Prasad, M.H.M.K. (2024). SPRAG: Building and Benchmarking a Short Programming-Related Answer Grading Dataset. Int. J. Data Sci. Anal., 1\u201313.","DOI":"10.1007\/s41060-024-00576-z"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1007\/s10639-024-12891-w","article-title":"ChatGPT as an Automated Essay Scoring Tool in the Writing Classrooms: How It Compares with Human Scoring","volume":"30","author":"Bui","year":"2024","journal-title":"Educ. Inf. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"e2191","DOI":"10.7717\/peerj-cs.2191","article-title":"An Optimized BERT Fine-Tuned Model Using an Artificial Bee Colony Algorithm for Automatic Essay Score Prediction","volume":"10","author":"Chassab","year":"2024","journal-title":"PeerJ Comput. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"82","DOI":"10.4218\/etrij.2023-0324","article-title":"Dual-Scale BERT Using Multi-Trait Representations for Holistic and Trait-Specific Essay Grading","volume":"46","author":"Cho","year":"2024","journal-title":"ETRI J."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1080\/08957347.2024.2386945","article-title":"Automated Scoring of Short-Answer Questions: A Progress Report","volume":"37","author":"Clauser","year":"2024","journal-title":"Appl. Meas. Educ."},{"key":"ref_21","first-page":"3788","article-title":"ChatGPT as an Instructor\u2019s Assistant for Generating and Scoring Exams","volume":"101","year":"2024","journal-title":"J. Chem. Educ."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1111\/jedm.12406","article-title":"Using Automated Procedures to Score Educational Essays Written in Three Languages","volume":"62","author":"Firoozi","year":"2025","journal-title":"J. Educ. Meas."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gr\u00e9visse, C. (2024). LLM-Based Automatic Short Answer Grading in Undergraduate Medical Education. BMC Med. Educ., 24.","DOI":"10.1186\/s12909-024-06026-5"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Huang, F., Sun, X., Mei, A., Wang, Y., Ding, H., and Zhu, T. (2024). LLM Plus Machine Learning Outperform Expert Rating to Predict Life Satisfaction from Self-Statement Text. IEEE Trans. Comput. Soc. Syst.","DOI":"10.1109\/TCSS.2024.3475413"},{"key":"ref_25","first-page":"571","article-title":"Can AI-Assisted Essay Assessment Support Teachers? A Cross-Sectional Mixed-Methods Research Conducted at the University of Montenegro","volume":"33","author":"Ivanovic","year":"2023","journal-title":"Ann. Istrian Mediterr. Stud."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"478","DOI":"10.26803\/ijlter.23.2.23","article-title":"A Comparative Analysis of the Rating of College Students\u2019 Essays by ChatGPT versus Human Raters","volume":"23","author":"Jackaria","year":"2024","journal-title":"Int. J. Learn. Teach. Educ. Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"96332","DOI":"10.1109\/ACCESS.2024.3420890","article-title":"A Hybrid Approach for Automated Short Answer Grading","volume":"12","author":"Kaya","year":"2024","journal-title":"IEEE Access"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"020163","DOI":"10.1103\/PhysRevPhysEducRes.19.020163","article-title":"Toward AI Grading of Student Problem Solutions in Introductory Physics: A Feasibility Study","volume":"19","author":"Kortemeyer","year":"2023","journal-title":"Phys. Rev. Phys. Educ. Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/s44163-024-00147-y","article-title":"Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading","volume":"4","author":"Kortemeyer","year":"2024","journal-title":"Discov. Artif. Intell."},{"key":"ref_30","first-page":"785","article-title":"Automated Essay Scoring Using Convolutional Neural Network Long Short-Term Memory with Mean of Question-Answer Encoding","volume":"18","author":"Kusumaningrum","year":"2024","journal-title":"ICIC Express Lett."},{"key":"ref_31","first-page":"282","article-title":"A BERT-Based Automatic Scoring Model of Korean Language Learners\u2019 Essay","volume":"18","author":"Lee","year":"2022","journal-title":"J. Inf. Process. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Li, F., Xi, X., Cui, Z., Li, D., and Zeng, W. (2023). Automatic Essay Scoring Method Based on Multi-Scale Features. Appl. Sci., 13.","DOI":"10.3390\/app13116775"},{"key":"ref_33","first-page":"1","article-title":"Applying Large Language Models for Automated Essay Scoring for Non-Native Japanese","volume":"11","author":"Li","year":"2024","journal-title":"Humanit. Soc. Sci. Commun."},{"key":"ref_34","first-page":"1","article-title":"ChatGPT Analysis of Strengths and Weaknesses in English Writing and Their Implications","volume":"9","author":"Li","year":"2024","journal-title":"Appl. Math. Nonlinear Sci."},{"key":"ref_35","first-page":"176992","article-title":"Enhanced BERT Approach to Score Arabic Essay\u2019s Relevance to the Prompt","volume":"2024","author":"Machhout","year":"2024","journal-title":"Commun. IBIMA"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"4565","DOI":"10.1007\/s10639-023-11890-7","article-title":"A Deep-Learning-Based Grading System (ASAG) for Reading Comprehension Assessment by Using Aphorisms as Open-Answer-Questions","volume":"29","author":"Mardini","year":"2024","journal-title":"Educ. Inf. Technol."},{"key":"ref_37","first-page":"768","article-title":"Automatic Essay Scoring for Arabic Short Answer Questions Using Text Mining Techniques","volume":"14","author":"Meccawy","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"100050","DOI":"10.1016\/j.rmal.2023.100050","article-title":"Exploring the Potential of Using an AI Language Model for Automated Essay Scoring","volume":"2","author":"Mizumoto","year":"2023","journal-title":"Res. Methods Appl. Linguist."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"101356","DOI":"10.1016\/j.tsc.2023.101356","article-title":"Beyond Semantic Distance: Automated Scoring of Divergent Thinking Greatly Improves with Large Language Models","volume":"49","author":"Organisciak","year":"2023","journal-title":"Think. Ski. Creat."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"100234","DOI":"10.1016\/j.caeai.2024.100234","article-title":"Large Language Models and Automated Essay Scoring of English Language Learner Writing: Insights into Validity and Reliability","volume":"6","author":"Pack","year":"2024","journal-title":"Comput. Educ. Artif. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Quah, B., Zheng, L., Sng, T.J.H., Yong, C.W., and Islam, I. (2024). Reliability of ChatGPT in Automated Essay Scoring for Dental Undergraduate Examinations. BMC Med. Educ., 24.","DOI":"10.1186\/s12909-024-05881-6"},{"key":"ref_42","first-page":"454","article-title":"A Multitask Learning System for Trait-Based Automated Short Answer Scoring","volume":"14","author":"Ramesh","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"e12684","DOI":"10.1111\/ejed.12684","article-title":"Coherence-Based Automatic Short Answer Scoring Using Sentence Embedding","volume":"59","author":"Ramesh","year":"2024","journal-title":"Eur. J. Educ."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.11591\/eei.v11i3.3531","article-title":"Indonesian Automatic Short Answer Grading System","volume":"11","author":"Salim","year":"2022","journal-title":"Bull. Electr. Eng. Inform."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"595","DOI":"10.29271\/jcpsp.2024.05.595","article-title":"The Revival of Essay-Type Questions in Medical Education: Harnessing Artificial Intelligence and Machine Learning","volume":"34","author":"Shamim","year":"2024","journal-title":"J. Coll. Physicians Surg. Pak."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"4979","DOI":"10.1007\/s10489-024-05410-4","article-title":"Modeling Essay Grading with Pre-Trained BERT Features","volume":"54","author":"Sharma","year":"2024","journal-title":"Appl. Intell."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1920","DOI":"10.1109\/TLT.2024.3396873","article-title":"Automated Essay Scoring and Revising Based on Open-Source Large Language Models","volume":"17","author":"Song","year":"2024","journal-title":"IEEE Trans. Learn. Technol."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"e34262","DOI":"10.1016\/j.heliyon.2024.e34262","article-title":"Harnessing LLMs for Multi-Dimensional Writing Assessment: Reliability and Alignment with Human Judgments","volume":"10","author":"Tang","year":"2024","journal-title":"Heliyon"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"100255","DOI":"10.1016\/j.caeai.2024.100255","article-title":"Can AI Provide Useful Holistic Essay Scoring?","volume":"7","author":"Tate","year":"2024","journal-title":"Comput. Educ. Artif. Intell."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Wang, Q., and Gayed, J.M. (2024). Effectiveness of Large Language Models in Automated Evaluation of Argumentative Essays: Finetuning vs. Zero-Shot Prompting. Comput. Assist. Lang. Learn.","DOI":"10.1080\/09588221.2024.2371395"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Wijanto, M.C., and Yong, H.S. (2024). Combining Balancing Dataset and SentenceTransformers to Improve Short Answer Grading Performance. Appl. Sci., 14.","DOI":"10.3390\/app14114532"},{"key":"ref_52","first-page":"503","article-title":"Automatic Short Answer Grading System in Indonesian Language Using BERT Machine Learning","volume":"35","author":"Wijaya","year":"2021","journal-title":"Rev. D\u2019intelligence Artif."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Wu, Y., Henriksson, A., Nouri, J., Duneld, M., and Li, X. (2022). Beyond Benchmarks: Spotting Key Topical Sentences While Improving Automated Essay Scoring Performance with Topic-Aware BERT. Electronics, 12.","DOI":"10.3390\/electronics12010150"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1109\/TLT.2022.3175537","article-title":"Automatic Short-Answer Grading via BERT-Based Deep Neural Networks","volume":"15","author":"Zhu","year":"2022","journal-title":"IEEE Trans. Learn. Technol."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"125403","DOI":"10.1109\/ACCESS.2021.3110683","article-title":"A Hierarchical BERT-Based Transfer Learning Approach for Multi-Dimensional Essay Scoring","volume":"9","author":"Xue","year":"2021","journal-title":"IEEE Access"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"100133","DOI":"10.1016\/j.rmal.2024.100133","article-title":"An Application of Many-Facet Rasch Measurement to Evaluate Automated Essay Scoring: A Case of ChatGPT-4.0","volume":"3","author":"Yamashita","year":"2024","journal-title":"Res. Methods Appl. Linguist."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1111\/bjet.13494","article-title":"Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments","volume":"56","author":"Yavuz","year":"2024","journal-title":"Br. J. Educ. Technol."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"100799","DOI":"10.1016\/j.patter.2023.100779","article-title":"GPT Detectors Are Biased against Non-Native English Writers","volume":"4","author":"Liang","year":"2023","journal-title":"Patterns"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Lin, H., and Chen, Q. (2024). Artificial Intelligence (AI) -Integrated Educational Applications and College Students\u2019 Creativity and Academic Emotions: Students and Teachers\u2019 Perceptions and Attitudes. BMC Psychol., 12.","DOI":"10.1186\/s40359-024-01979-0"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Hackl, V., M\u00fcller, A.E., Granitzer, M., and Sailer, M. (2023). Is GPT-4 a Reliable Rater? Evaluating Consistency in GPT-4\u2019s Text Ratings. Front. Educ., 8.","DOI":"10.3389\/feduc.2023.1272229"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"e2405460121","DOI":"10.1073\/pnas.2405460121","article-title":"Evaluating Large Language Models in Theory of Mind Tasks","volume":"121","author":"Kosinski","year":"2023","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"e2218523120","DOI":"10.1073\/pnas.2218523120","article-title":"Using Cognitive Psychology to Understand GPT-3","volume":"120","author":"Binz","year":"2023","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1080\/0142159X.2023.2208731","article-title":"Medical Teacher\u2019s First ChatGPT\u2019s Referencing Hallucinations: Lessons for Editors, Reviewers, and Teachers","volume":"45","author":"Masters","year":"2023","journal-title":"Med. Teach."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"e103","DOI":"10.52225\/narra.v3i1.103","article-title":"ChatGPT Applications in Medical, Dental, Pharmacy, and Public Health Education: A Descriptive Study Highlighting the Advantages and Limitations","volume":"3","author":"Sallam","year":"2023","journal-title":"Narra J."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"e48291","DOI":"10.2196\/48291","article-title":"Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions","volume":"9","author":"AlSaad","year":"2023","journal-title":"JMIR Med. Educ."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"81","DOI":"10.54097\/fcis.v2i2.4465","article-title":"The Benefits and Challenges of ChatGPT: An Overview","volume":"2","author":"Deng","year":"2022","journal-title":"Front. Comput. Intell. Syst."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"102274","DOI":"10.1016\/j.lindif.2023.102274","article-title":"ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education","volume":"103","author":"Kasneci","year":"2023","journal-title":"Learn. Individ. Differ."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"100211","DOI":"10.1016\/j.hcc.2024.100211","article-title":"A Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly","volume":"4","author":"Yao","year":"2024","journal-title":"High-Confid. Comput."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"100206","DOI":"10.1016\/j.caeai.2024.100206","article-title":"Automatic Assessment of Text-Based Responses in Post-Secondary Education: A Systematic Review","volume":"6","author":"Gao","year":"2024","journal-title":"Comput. Educ. Artif. Intell."},{"key":"ref_70","first-page":"1877","article-title":"Language Models Are Few-Shot Learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process Syst."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"010109","DOI":"10.1103\/PhysRevPhysEducRes.20.010109","article-title":"Performance of ChatGPT on the Test of Understanding Graphs in Kinematics","volume":"20","author":"Polverini","year":"2024","journal-title":"Phys. Rev. Phys. Educ. Res."},{"key":"ref_72","unstructured":"Kaufmann, T., Weng, P., Kunshan, D., Bengs, V., and H\u00fcllermeier, E. (2023). A Survey of Reinforcement Learning from Human Feedback. arXiv."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Atkinson, J., and Palma, D. (2025). An LLM-Based Hybrid Approach for Enhanced Automated Essay Scoring. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-87862-3"},{"key":"ref_74","unstructured":"(2025, May 08). Introducing Claude 3.5 Sonnet\\Anthropic. Available online: https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"2629","DOI":"10.1007\/s10439-023-03272-4","article-title":"Prompt Engineering with ChatGPT: A Guide for Academic Writers","volume":"51","author":"Giray","year":"2023","journal-title":"Ann. Biomed. Eng."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"100022","DOI":"10.1016\/j.metrad.2023.100022","article-title":"A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges","volume":"1","author":"Nazir","year":"2023","journal-title":"Meta Radiol."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1089\/big.2013.1508","article-title":"Data Science and Its Relationship to Big Data and Data-Driven Decision Making","volume":"1","author":"Provost","year":"2013","journal-title":"Big Data"},{"key":"ref_78","unstructured":"Powers, D.M.W. (2020). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1037\/h0026256","article-title":"Weighted Kappa: Nominal Scale Agreement Provision for Scaled Disagreement or Partial Credit","volume":"70","author":"Cohen","year":"1968","journal-title":"Psychol. Bull."},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Mohammad, A.F., Clark, B., Agarwal, R., and Summers, S. (2023, January 24\u201327). LLM\/GPT Generative AI and Artificial General Intelligence (AGI): The Next Frontier. Proceedings of the 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023, Las Vegas, NV, USA,.","DOI":"10.1109\/CSCE60160.2023.00073"},{"key":"ref_81","unstructured":"(2025, May 07). AI Principles|OECD. Available online: https:\/\/www.oecd.org\/en\/topics\/ai-principles.html."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Shahriari, K., and Shahriari, M. (2017, January 21\u201322). IEEE Standard Review\u2014Ethically Aligned Design: A Vision for Prioritizing Human Wellbeing with Artificial Intelligence and Autonomous Systems. Proceedings of the IHTC 2017\u2014IEEE Canada International Humanitarian Technology Conference 2017, Toronto, ON, Canada.","DOI":"10.1109\/IHTC.2017.8058187"},{"key":"ref_83","unstructured":"Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep Learning, MIT Press."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018, January 15\u201320). Universal Language Model Fine-Tuning for Text Classification. Proceedings of the ACL 2018\u201456th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_85","first-page":"4300","article-title":"Deep Reinforcement Learning from Human Preferences","volume":"2017","author":"Christiano","year":"2017","journal-title":"Adv. Neural Inf. Process Syst."},{"key":"ref_86","unstructured":"Gal, Y., and Ghahramani, Z. (2016, January 19\u201324). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York, NY, USA."},{"key":"ref_87","first-page":"1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel","year":"2019","journal-title":"J. Mach. Learn. Res."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Goslen, A., Kim, Y.J., Rowe, J., and Lester, J. (2024). LLM-Based Student Plan Generation for Adaptive Scaffolding in Game-Based Learning Environments. Int. J. Artif. Intell. Educ., 1\u201326.","DOI":"10.1007\/s40593-024-00421-1"},{"key":"ref_89","unstructured":"(2025, May 08). Grok 3 Beta\u2014The Age of Reasoning Agents|XAI. Available online: https:\/\/x.ai\/news\/grok-3."},{"key":"ref_90","unstructured":"Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., and Ruan, C. (2024). Deepseek-v3 technical report. arXiv."}],"container-title":["Applied Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2076-3417\/15\/10\/5683\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:35:26Z","timestamp":1760031326000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2076-3417\/15\/10\/5683"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,20]]},"references-count":90,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["app15105683"],"URL":"https:\/\/doi.org\/10.3390\/app15105683","relation":{},"ISSN":["2076-3417"],"issn-type":[{"value":"2076-3417","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,20]]}}}