{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T15:01:29Z","timestamp":1771858889318,"version":"3.50.1"},"reference-count":67,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T00:00:00Z","timestamp":1771372800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T00:00:00Z","timestamp":1771804800000},"content-version":"vor","delay-in-days":5,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005967","name":"Linnaeus University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005967","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Discov Artif Intell"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The assessment of open-ended written work is of vital importance to the student learning experience. Conventional essay grading methods heavily depend on expert manual assessment, making them susceptible to errors due to fatigue, bias, and subjectivity. To address this, recent research has introduced AI-based Automated Essay Scoring (AES) systems. While most studies have concentrated on predicting scores, only a few have integrated AES systems with the well-known Large Language Models (LLMs). This study explores the application of LLMs, including GPT and Gemini for AES. The proposed approach was evaluated on two benchmark datasets, namely \u201cHewlett Foundation: Automated Essay Scoring (ASAP\u2013AES)\u201d and \u201cLearning Agency Lab\u2013Automated Essay Scoring 2.0 (LA\u2013AES)\u201d. The proposed method achieved promising results in AES, demonstrating effectiveness on both the benchmark datasets. Statistical analysis revealed that Gemini outperformed GPT, achieving an average Quadratic Weighted Kappa (QWK) score of 0.45 on the ASAP\u2013AES and 0.43 on the LA\u2013AES. To assess the generalizability and objectivity of the proposed approach, real-world data was collected from an O-Level classroom at Sukkur IBA Community College, Pakistan. Multiple human evaluators participated in the study to examine potential biases in human assessment. The findings indicate that LLM-based scoring demonstrates improved objectivity and reduced bias compared to human assessors.<\/jats:p>","DOI":"10.1007\/s44163-026-01002-y","type":"journal-article","created":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T05:45:48Z","timestamp":1771393548000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Exploring potential of large language models for automated essay scoring in education"],"prefix":"10.1007","volume":"6","author":[{"given":"Nimra","family":"Mughal","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ali Shariq","family":"Imran","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sher Muhammad","family":"Daudpota","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zenun","family":"Kastrati","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Waheed","family":"Noor","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,2,18]]},"reference":[{"issue":"4","key":"1002_CR1","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1037\/0022-3514.35.4.250","volume":"35","author":"RE Nisbett","year":"1977","unstructured":"Nisbett RE, Wilson TD. The halo effect: evidence for unconscious alteration of judgments. J Pers Soc Psychol. 1977;35(4):250.","journal-title":"J Pers Soc Psychol"},{"issue":"3","key":"1002_CR2","doi-asserted-by":"publisher","first-page":"2495","DOI":"10.1007\/s10462-021-10068-2","volume":"55","author":"D Ramesh","year":"2022","unstructured":"Ramesh D, Sanampudi SK. An automated essay scoring systems: a systematic literature review. Artif Intell Rev. 2022;55(3):2495\u2013527.","journal-title":"Artif Intell Rev"},{"issue":"3","key":"1002_CR3","doi-asserted-by":"publisher","first-page":"415","DOI":"10.17239\/jowr-2020.11.03.01","volume":"11","author":"SA Crossley","year":"2020","unstructured":"Crossley SA. Linguistic features in writing quality and development: An overview. J Writ Res. 2020;11(3):415\u201343.","journal-title":"J Writ Res"},{"issue":"4","key":"1002_CR4","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1177\/0265532217712554","volume":"34","author":"K Kyle","year":"2017","unstructured":"Kyle K, Crossley S. Assessing syntactic sophistication in l2 writing: a usage-based approach. Lang Test. 2017;34(4):513\u201335.","journal-title":"Lang Test"},{"issue":"2","key":"1002_CR5","doi-asserted-by":"publisher","first-page":"21","DOI":"10.48161\/qaj.v1n2a40","volume":"1","author":"DH Maulud","year":"2021","unstructured":"Maulud DH, Zeebaree SR, Jacksi K, Sadeeq MAM, Sharif KH. State of art for semantic analysis of natural language processing. Qubahan Acad J. 2021;1(2):21\u20138.","journal-title":"Qubahan Acad J"},{"issue":"5","key":"1002_CR6","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/5254.889104","volume":"15","author":"MA Hearst","year":"2000","unstructured":"Hearst MA. The debate on automated essay grading. IEEE Intell Syst Appl. 2000;15(5):22\u201337.","journal-title":"IEEE Intell Syst Appl"},{"issue":"2","key":"1002_CR7","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1075\/itl.165.2.01col","volume":"165","author":"K Collins-Thompson","year":"2014","unstructured":"Collins-Thompson K. Computational assessment of text readability: a survey of current and future research. ITL-Int J Appl Linguist. 2014;165(2):97\u2013135.","journal-title":"ITL-Int J Appl Linguist"},{"issue":"3","key":"1002_CR8","doi-asserted-by":"publisher","first-page":"1875","DOI":"10.47836\/pjst.29.3.27","volume":"29","author":"CT Lim","year":"2021","unstructured":"Lim CT, Bong CH, Wong WS, Lee NK. A comprehensive review of automated essay scoring (AES) research and development. Pertanika J Sci Technol. 2021;29(3):1875\u201399.","journal-title":"Pertanika J Sci Technol"},{"key":"1002_CR9","first-page":"64","volume":"5","author":"LR Medsker","year":"2001","unstructured":"Medsker LR, Jain L. Recurrent neural networks. Des Appl. 2001;5:64\u20137.","journal-title":"Des Appl"},{"issue":"7","key":"1002_CR10","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1162\/neco_a_01199","volume":"31","author":"Y Yu","year":"2019","unstructured":"Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019;31(7):1235\u201370.","journal-title":"Neural Comput"},{"issue":"4","key":"1002_CR11","doi-asserted-by":"publisher","first-page":"897","DOI":"10.3390\/psych3040056","volume":"3","author":"S Ludwig","year":"2021","unstructured":"Ludwig S, Mayer C, Hansen C, Eilers K, Brandt S. Automated essay scoring using transformer models. Psych. 2021;3(4):897\u2013915.","journal-title":"Psych"},{"key":"1002_CR12","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/j.aiopen.2022.10.001","volume":"3","author":"T Lin","year":"2022","unstructured":"Lin T, Wang Y, Liu X, Qiu X. A survey of transformers. AI Open. 2022;3:111\u201332.","journal-title":"AI Open"},{"key":"1002_CR13","doi-asserted-by":"publisher","first-page":"125403","DOI":"10.1109\/ACCESS.2021.3110683","volume":"9","author":"J Xue","year":"2021","unstructured":"Xue J, Tang X, Zheng L. A hierarchical bert-based transfer learning approach for multi-dimensional essay scoring. IEEE Access. 2021;9:125403\u201315.","journal-title":"IEEE Access"},{"key":"1002_CR14","unstructured":"Koroteev M. Bert: a review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943. 2021."},{"key":"1002_CR15","doi-asserted-by":"crossref","unstructured":"Mayfield E, Black AW. Should you fine-tune bert for automated essay scoring? In: Proceedings of the fifteenth workshop on innovative use of NLP for building educational applications. 2020. pp. 151\u2013162.","DOI":"10.18653\/v1\/2020.bea-1.15"},{"key":"1002_CR16","doi-asserted-by":"publisher","unstructured":"Prabhu S, Akhila KSS. A hybrid approach towards automated essay evaluation based on bert and feature engineering. In: 2022 IEEE 7th international conference for convergence in technology (I2CT). 2022. pp. 1\u20134. https:\/\/doi.org\/10.1109\/I2CT54291.2022.9824999.","DOI":"10.1109\/I2CT54291.2022.9824999"},{"key":"1002_CR17","doi-asserted-by":"crossref","unstructured":"Zheng C, Huang L, Lin H, Guo Y, Huang L. Bert-based automatic scoring model for speech-oriented text modality. In: 2022 IEEE 2nd international conference on electronic technology, communication and information (ICETCI). IEEE; 2022. pp. 100\u2013105.","DOI":"10.1109\/ICETCI55101.2022.9832254"},{"key":"1002_CR18","doi-asserted-by":"publisher","first-page":"1880","DOI":"10.1109\/TLT.2024.3396873","volume":"17","author":"Y Song","year":"2024","unstructured":"Song Y, Zhu Q, Wang H, Zheng Q. Automated essay scoring and revising based on open-source large language models. IEEE Trans Learn Technol. 2024a;17:1880\u201390. https:\/\/doi.org\/10.1109\/TLT.2024.3396873.","journal-title":"IEEE Trans Learn Technol"},{"issue":"1","key":"1002_CR19","first-page":"1","volume":"11","author":"W Li","year":"2024","unstructured":"Li W, Liu H. Applying large language models for automated essay scoring for non-native Japanese. Human Social Sci Commun. 2024;11(1):1\u201315.","journal-title":"Human Social Sci Commun"},{"key":"1002_CR20","doi-asserted-by":"crossref","unstructured":"Lee S, Cai Y, Meng D, Wang Z, Wu Y. Unleashing large language models\u2019 proficiency in zero-shot essay scoring. In: Findings of the association for computational linguistics: EMNLP 2024. 2024. pp. 181\u2013198.","DOI":"10.18653\/v1\/2024.findings-emnlp.10"},{"issue":"2","key":"1002_CR21","first-page":"15","volume":"2013","author":"D Blanchard","year":"2013","unstructured":"Blanchard D, Tetreault J, Higgins D, Cahill A, Chodorow M. Toefl11: A corpus of non-native English. ETS Res Rep Ser. 2013;2013(2):15.","journal-title":"ETS Res Rep Ser"},{"key":"1002_CR22","doi-asserted-by":"crossref","unstructured":"Abraham A. Rule-based expert systems. Handbook of measuring system design. 2005.","DOI":"10.1002\/0471497398.mm422"},{"key":"1002_CR23","volume-title":"Probabilistic networks and expert systems: exact computational methods for Bayesian networks","author":"RG Cowell","year":"2007","unstructured":"Cowell RG, Dawid P, Lauritzen SL, Spiegelhalter DJ. Probabilistic networks and expert systems: exact computational methods for Bayesian networks. New York: Springer; 2007."},{"key":"1002_CR24","first-page":"012030","volume-title":"Journal of physics: conference series","author":"V Ramalingam","year":"2018","unstructured":"Ramalingam V, Pandian A, Chetry P, Nigam H. Automated essay grading using machine learning algorithm. In: Publishing IOP, editor. Journal of physics: conference series, vol. 1000. Bristol; 2018. p. 012030."},{"key":"1002_CR25","doi-asserted-by":"crossref","unstructured":"Ke Z, Ng V. Automated essay scoring: a survey of the state of the art. In: IJCAI, vol 19. 2019. pp. 6300\u20136308.","DOI":"10.24963\/ijcai.2019\/879"},{"key":"1002_CR26","doi-asserted-by":"crossref","unstructured":"Baidoo-Anu D, Owusu\u00a0Ansah L. Education in the era of generative artificial intelligence (ai): Understanding the potential benefits of chatgpt in promoting teaching and learning. Available at SSRN 4337484. 2023.","DOI":"10.2139\/ssrn.4337484"},{"issue":"2","key":"1002_CR27","doi-asserted-by":"publisher","first-page":"599","DOI":"10.1111\/jcal.12635","volume":"38","author":"A Nunes","year":"2022","unstructured":"Nunes A, Cordeiro C, Limpo T, Castro SL. Effectiveness of automated writing evaluation systems in school settings: a systematic review of studies from 2000 to 2020. J Comput Assist Learn. 2022;38(2):599\u2013620.","journal-title":"J Comput Assist Learn"},{"key":"1002_CR28","doi-asserted-by":"crossref","unstructured":"Connor U. Linguistic\/rhetorical measures for international persuasive student writing. Res Teach English. 1990;67\u201387.","DOI":"10.58680\/rte199015501"},{"issue":"2","key":"1002_CR29","first-page":"209","volume":"12","author":"GL Parra","year":"2019","unstructured":"Parra GL, Calero SX. Automated writing evaluation tools in the improvement of the writing skill. Int J Instr. 2019;12(2):209\u201326.","journal-title":"Int J Instr"},{"key":"1002_CR30","doi-asserted-by":"publisher","first-page":"208","DOI":"10.7717\/peerj-cs.208","volume":"5","author":"MA Hussein","year":"2019","unstructured":"Hussein MA, Hassan H, Nassef M. Automated language essay scoring systems: a literature review. PeerJ Comput Sci. 2019;5:208.","journal-title":"PeerJ Comput Sci"},{"key":"1002_CR31","first-page":"1","volume-title":"Handbook of open distance and digital education","author":"D Ifenthaler","year":"2022","unstructured":"Ifenthaler D. Automated essay scoring systems. In: Handbook of open distance and digital education. New York: Springer; 2022. p. 1\u201315."},{"key":"1002_CR32","unstructured":"Lu C, Cutumisu M. Integrating deep learning into an automated feedback generation system for automated essay scoring. Int Educ Data Min Soc. 2021."},{"key":"1002_CR33","unstructured":"Attali Y, Burstein J. Automated essay scoring with e-rater\u00ae v.2. J Technol Learn Assess 2006;4(3)."},{"key":"1002_CR34","volume-title":"Introduction to machine learning","author":"E Alpaydin","year":"2020","unstructured":"Alpaydin E. Introduction to machine learning. Cambridge: MIT press; 2020."},{"issue":"3","key":"1002_CR35","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1080\/0969594032000148154","volume":"10","author":"TK Landauer","year":"2003","unstructured":"Landauer TK. Automatic essay assessment. Assess Educ Principles Policy Practice. 2003;10(3):295\u2013308.","journal-title":"Assess Educ Principles Policy Practice"},{"key":"1002_CR36","doi-asserted-by":"publisher","first-page":"381","DOI":"10.21275\/ART20203995","volume":"9","author":"B Mahesh","year":"2020","unstructured":"Mahesh B. Machine learning algorithms-a review. Int J Sci Res (IJSR) [Internet]. 2020;9:381\u20136.","journal-title":"Int J Sci Res (IJSR) [Internet]"},{"key":"1002_CR37","unstructured":"Cohen Y, Ben-Simon A, Hovav M. The effect of specific language features on the complexity of systems for automated essay scoring. 2003."},{"key":"1002_CR38","unstructured":"Mahana M, Johns M, Apte A. Automated essay grading using machine learning. Mach. Learn Session 2012;5."},{"key":"1002_CR39","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/s13042-010-0001-0","volume":"1","author":"Y Zhang","year":"2010","unstructured":"Zhang Y, Jin R, Zhou Z-H. Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern. 2010;1:43\u201352.","journal-title":"Int J Mach Learn Cybern"},{"key":"1002_CR40","unstructured":"Heeman PA. Pos tags and decision trees for language modeling. In: 1999 joint SIGDAT conference on empirical methods in natural language processing and very large corpora. 1999."},{"key":"1002_CR41","volume-title":"The structure of English orthography","author":"RL Venezky","year":"2011","unstructured":"Venezky RL. The structure of English orthography. In: The structure of English orthography. Berlin: De Gruyter Mouton; 2011."},{"key":"1002_CR42","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1007\/s11831-019-09344-w","volume":"27","author":"S Dargan","year":"2020","unstructured":"Dargan S, Kumar M, Ayyagari MR, Kumar G. A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng. 2020;27:1071\u201392.","journal-title":"Arch Comput Methods Eng"},{"issue":"6","key":"1002_CR43","doi-asserted-by":"publisher","first-page":"706","DOI":"10.1016\/j.ajog.2023.03.010","volume":"228","author":"MR Chavez","year":"2023","unstructured":"Chavez MR, Butler TS, Rekawek P, Heo H, Kinzler WL. Chat generative pre-trained transformer: why we should embrace this technology. Am J Obstet Gynecol. 2023;228(6):706\u201311.","journal-title":"Am J Obstet Gynecol"},{"key":"1002_CR44","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A.N, Kaiser \u0141, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst. 2017;30."},{"key":"1002_CR45","doi-asserted-by":"crossref","unstructured":"Tenney I, Das D, Pavlick E. Bert rediscovers the classical nlp pipeline. arXiv preprint arXiv:1905.05950. 2019.","DOI":"10.18653\/v1\/P19-1452"},{"issue":"1","key":"1002_CR46","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1017\/S1351324920000601","volume":"27","author":"R Dale","year":"2021","unstructured":"Dale R. Gpt-3: What\u2019s it good for? Nat Lang Eng. 2021;27(1):113\u20138.","journal-title":"Nat Lang Eng"},{"key":"1002_CR47","doi-asserted-by":"crossref","unstructured":"Annepaka Y, Pakray P. Large language models: A survey of their development, capabilities, and applications. Knowl Inf Syst 2024;1\u201356.","DOI":"10.1007\/s10115-024-02310-4"},{"issue":"1","key":"1002_CR48","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1145\/3416063","volume":"27","author":"V Cohen","year":"2020","unstructured":"Cohen V, Gokaslan A. Opengpt-2: Open language models and implications of generated text. XRDS Crossroads ACM Mag Stud. 2020;27(1):26\u201330.","journal-title":"XRDS Crossroads ACM Mag Stud"},{"issue":"1","key":"1002_CR49","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/s40561-024-00310-z","volume":"11","author":"M Imran","year":"2024","unstructured":"Imran M, Almusharraf N. Google gemini as a next generation ai educational tool: a review of emerging educational technology. Smart Learn Environ. 2024;11(1):22.","journal-title":"Smart Learn Environ"},{"key":"1002_CR50","doi-asserted-by":"crossref","unstructured":"Masalkhi M, Ong J, Waisberg E, Zaman N, Sarker P, Lee AG, Tavakkoli A. A side-by-side evaluation of llama 2 by meta with chatgpt and its application in ophthalmology. Eye 2024;1\u20134.","DOI":"10.1038\/s41433-024-02972-y"},{"key":"1002_CR51","doi-asserted-by":"crossref","unstructured":"Meyer L, Dannecker A. Comparative analysis of generative ai models in educational exercise performance. In: EDULEARN24 Proceedings. IATED; 2024. pp. 5181\u20135190.","DOI":"10.21125\/edulearn.2024.1273"},{"key":"1002_CR52","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1007\/s11023-020-09548-1","volume":"30","author":"L Floridi","year":"2020","unstructured":"Floridi L, Chiriatti M. Gpt-3: its nature, scope, limits, and consequences. Mind Mach. 2020;30:681\u201394.","journal-title":"Mind Mach"},{"issue":"3","key":"1002_CR53","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1108\/LHTN-01-2023-0009","volume":"40","author":"BD Lund","year":"2023","unstructured":"Lund BD, Wang T. Chatting about chatgpt: How may ai and gpt impact academia and libraries? Library Hi Tech News. 2023;40(3):26\u20139.","journal-title":"Library Hi Tech News"},{"key":"1002_CR54","doi-asserted-by":"publisher","DOI":"10.1016\/j.lindif.2023.102274","volume":"103","author":"E Kasneci","year":"2023","unstructured":"Kasneci E, Se\u00dfler K, K\u00fcchemann S, Bannert M, Dementieva D, Fischer F, et al. Chatgpt for good? On opportunities and challenges of large language models for education. Learn Individ Differ. 2023;103:102274.","journal-title":"Learn Individ Differ"},{"issue":"2","key":"1002_CR55","doi-asserted-by":"publisher","DOI":"10.1016\/j.rmal.2023.100050","volume":"2","author":"A Mizumoto","year":"2023","unstructured":"Mizumoto A, Eguchi M. Exploring the potential of using an ai language model for automated essay scoring. Res Methods Appl Linguist. 2023;2(2):100050.","journal-title":"Res Methods Appl Linguist"},{"key":"1002_CR56","doi-asserted-by":"publisher","DOI":"10.1016\/j.caeai.2024.100213","volume":"6","author":"E Latif","year":"2024","unstructured":"Latif E, Zhai X. Large language models and automated essay scoring of English language learner writing: insights into validity and reliability. Comput Educ Artif Intell. 2024;6:100213. https:\/\/doi.org\/10.1016\/j.caeai.2024.100213.","journal-title":"Comput Educ Artif Intell"},{"key":"1002_CR57","doi-asserted-by":"crossref","unstructured":"Amin T, Aadil F, Awan KM, Lim S, et al. Enhancing essay scoring: an analytical and holistic approach with few-shot transformer-based models. IEEE Access. 2025.","DOI":"10.1109\/ACCESS.2025.3530272"},{"issue":"4","key":"1002_CR58","first-page":"309","volume":"5","author":"U Connor","year":"1985","unstructured":"Connor U, Lauer J. Understanding persuasive essay writing: linguistic\/rhetorical approach. Text-Interdiscip J Study Discourse. 1985;5(4):309\u201326.","journal-title":"Text-Interdiscip J Study Discourse"},{"issue":"1","key":"1002_CR59","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1111\/j.1745-3992.2011.00223.x","volume":"31","author":"DM Williamson","year":"2012","unstructured":"Williamson DM, Xi X, Breyer FJ. A framework for evaluation and use of automated scoring. Educ Meas Issues Pract. 2012;31(1):2\u201313.","journal-title":"Educ Meas Issues Pract"},{"key":"1002_CR60","unstructured":"Shermis MD, Hamner B. Contrasting state-of-the-art automated scoring of essays: Analysis. In: Annual National Council on Measurement in Education Meeting. 2012."},{"key":"1002_CR61","unstructured":"Yannakoudakis H, Briscoe T, Medlock B. A new dataset and method for automatically grading ESOL texts. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. 2011. pp. 180\u2013189 ."},{"issue":"4","key":"1002_CR62","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1037\/h0026256","volume":"70","author":"J Cohen","year":"1968","unstructured":"Cohen J. Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213\u201320.","journal-title":"Psychol Bull"},{"issue":"3","key":"1002_CR63","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1177\/001316447303300309","volume":"33","author":"JL Fleiss","year":"1973","unstructured":"Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measur. 1973;33(3):613\u20139.","journal-title":"Educ Psychol Measur"},{"key":"1002_CR64","doi-asserted-by":"publisher","first-page":"1880","DOI":"10.1109\/TLT.2024.3396873","volume":"17","author":"Y Song","year":"2024","unstructured":"Song Y, Zhu Q, Wang H, Zheng Q. Automated essay scoring and revising based on open-source large language models. IEEE Trans Learn Technol. 2024b;17:1880\u201390.","journal-title":"IEEE Trans Learn Technol"},{"issue":"1","key":"1002_CR65","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159\u201374.","journal-title":"Biometrics"},{"key":"1002_CR66","doi-asserted-by":"publisher","DOI":"10.1201\/9780429258589","volume-title":"Practical statistics for medical research","author":"DG Altman","year":"1990","unstructured":"Altman DG. Practical statistics for medical research. London: Chapman and Hall\/CRC; 1990."},{"key":"1002_CR67","volume-title":"Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters","author":"KL Gwet","year":"2014","unstructured":"Gwet KL. Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. 4th ed. Piedmont: Advanced Analyticsn, LLC; 2014.","edition":"4"}],"container-title":["Discover Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44163-026-01002-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-026-01002-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-026-01002-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T14:03:40Z","timestamp":1771855420000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44163-026-01002-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,18]]},"references-count":67,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["1002"],"URL":"https:\/\/doi.org\/10.1007\/s44163-026-01002-y","relation":{},"ISSN":["2731-0809"],"issn-type":[{"value":"2731-0809","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,18]]},"assertion":[{"value":"18 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 February 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no conflict of interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"166"}}