{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T14:41:30Z","timestamp":1775486490474,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T00:00:00Z","timestamp":1740960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,3,3]]},"DOI":"10.1145\/3706468.3706527","type":"proceedings-article","created":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:04:11Z","timestamp":1740146651000},"page":"462-472","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3380-4641","authenticated-orcid":false,"given":"Kathrin","family":"Se\u00dfler","sequence":"first","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1090-9299","authenticated-orcid":false,"given":"Maurice","family":"F\u00fcrstenberg","sequence":"additional","affiliation":[{"name":"University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1679-4979","authenticated-orcid":false,"given":"Babette","family":"B\u00fchler","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3146-4484","authenticated-orcid":false,"given":"Enkelejda","family":"Kasneci","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}]}],"member":"320","published-online":{"date-parts":[[2025,3,3]]},"reference":[{"key":"e_1_3_3_1_2_2","unstructured":"AI@Meta. 2024. Llama 3 Model Card."},{"key":"e_1_3_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-53960-2_1"},{"key":"e_1_3_3_1_4_2","unstructured":"Anthropic. 2023. Claude. https:\/\/www.anthropic.com\/"},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Xiaoyu Bai and Manfred Stede. 2023. A survey of current machine learning approaches to student free-text evaluation for intelligent tutoring. International Journal of Artificial Intelligence in Education 33 4 (2023) 992\u20131030.","DOI":"10.1007\/s40593-022-00323-0"},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Majdi Beseiso Omar\u00a0A Alzubi and Hasan Rashaideh. 2021. A novel automated essay scoring approach for reliable higher educational assessments. Journal of Computing in Higher Education 33 (2021) 727\u2013746.","DOI":"10.1007\/s12528-021-09283-1"},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Arne Bewersdorff Christian Hartmann Marie Hornberger Kathrin Se\u00dfler Maria Bannert Enkelejda Kasneci Gjergji Kasneci Xiaoming Zhai and Claudia Nerdel. 2024. Taking the next step with generative artificial intelligence: The transformative role of multimodal large language models in science education. arxiv:https:\/\/arXiv.org\/abs\/2401.00832","DOI":"10.1016\/j.lindif.2024.102601"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Arne Bewersdorff Kathrin Se\u00dfler Armin Baur Enkelejda Kasneci and Claudia Nerdel. 2023. Assessing student errors in experimentation using artificial intelligence and large language models: A comparative study with human raters. Computers and Education: Artificial Intelligence 5 (2023) 100177.","DOI":"10.1016\/j.caeai.2023.100177"},{"key":"e_1_3_3_1_9_2","first-page":"701","volume-title":"EDM","author":"Bhat Shravya","year":"2022","unstructured":"Shravya Bhat, Huy\u00a0Anh Nguyen, Steven Moore, John\u00a0C Stamper, Majd Sakr, and Eric Nyberg. 2022. Towards Automated Generation and Evaluation of Questions in Educational Domains.. In EDM, Antonija Mitrovic and Nigel Bosch (Eds.). International Educational Data Mining Society, Durham, United Kingdom, 701\u2013704."},{"key":"e_1_3_3_1_10_2","unstructured":"Peter Birkel and Claudia Birkel. 2002. Wie einig sind sich Lehrer bei der Aufsatzbeurteilung? Eine Replikationsstudie zur Untersuchung von Rudolf Weiss. Psychologie in Erziehung und Unterricht 49 3 (2002) 219\u2013224."},{"key":"e_1_3_3_1_11_2","volume-title":"TOEFL11: A Corpus of Non-Native English","author":"Blanchard Daniel","year":"2013","unstructured":"Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, and Martin Chodorow. 2013. TOEFL11: A Corpus of Non-Native English. Technical Report. Educational Testing Service."},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.870"},{"key":"e_1_3_3_1_13_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171\u20134186."},{"key":"e_1_3_3_1_14_2","first-page":"475","volume-title":"The 14th International Conference on Educational Data Mining (EDM21)","author":"Doewes Afrizal","year":"2021","unstructured":"Afrizal Doewes and Mykola Pechenizkiy. 2021. On the Limitations of Human-Computer Agreement in Automated Essay Scoring.. In The 14th International Conference on Educational Data Mining (EDM21). International Educational Data Mining Society, Paris, France, 475\u2013480."},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Jonas Flod\u00e9n. 2024. Grading exams using large language models: A comparison between human and AI grading of exams in higher education using ChatGPT. British Educational Research Journal 00 (2024) 1\u201324.","DOI":"10.1002\/berj.4069"},{"key":"e_1_3_3_1_16_2","unstructured":"Google Gemini\u00a0Team. 2024. Gemini: A Family of Highly Capable Multimodal Models."},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Arthur\u00a0C Graesser Danielle\u00a0S McNamara Max\u00a0M Louwerse and Zhiqiang Cai. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods instruments & computers 36 2 (2004) 193\u2013202.","DOI":"10.3758\/BF03195564"},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-322-87644-7"},{"key":"e_1_3_3_1_19_2","doi-asserted-by":"crossref","unstructured":"Veronika Hackl Alexandra\u00a0Elena M\u00fcller Michael Granitzer and Maximilian Sailer. 2023. Is GPT-4 a reliable rater? Evaluating consistency in GPT-4\u2019s text ratings. Frontiers in Education 8 1272229.","DOI":"10.3389\/feduc.2023.1272229"},{"key":"e_1_3_3_1_20_2","unstructured":"Ben Hamner Jaison Morgan Iynnvandev Mark Shermis and Tom\u00a0Vander Ark. 2012. The hewlett foundation: Automated essay scoring.https:\/\/www.kaggle.com\/c\/asap-aes"},{"key":"e_1_3_3_1_21_2","unstructured":"Hendrik Haverkamp Malte Hecht and Kirsten Schindler. 2024. Lernf\u00f6rderliches Feedback KI-basiert vermitteln. Der Deutschunterricht 5 (2024)."},{"key":"e_1_3_3_1_22_2","doi-asserted-by":"crossref","unstructured":"Hyangeun Ji Insook Han and Yujung Ko. 2023. A systematic review of conversational AI in language education: Focusing on the collaboration with human teachers. Journal of Research on Technology in Education 55 1 (2023) 48\u201363.","DOI":"10.1080\/15391523.2022.2142873"},{"key":"e_1_3_3_1_23_2","volume-title":"Mistral 7B (2023)","author":"Jiang AQ","year":"2023","unstructured":"AQ Jiang, A Sablayrolles, A Mensch, C Bamford, DS Chaplot, D de\u00a0las Casas, F Bressand, G Lengyel, G Lample, L Saulnier, et\u00a0al. 2023. Mistral 7B (2023). Technical Report. Mistral AI."},{"key":"e_1_3_3_1_24_2","volume-title":"Mixtral of experts","author":"Jiang Albert\u00a0Q","year":"2024","unstructured":"Albert\u00a0Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra\u00a0Singh Chaplot, Diego de\u00a0las Casas, Emma\u00a0Bou Hanna, Florian Bressand, et\u00a0al. 2024. Mixtral of experts. Technical Report. Mistral AI."},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Enkelejda Kasneci Kathrin Se\u00dfler Stefan K\u00fcchemann Maria Bannert Daryna Dementieva Frank Fischer Urs Gasser Georg Groh Stephan G\u00fcnnemann Eyke H\u00fcllermeier et\u00a0al. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 103 (2023) 102274.","DOI":"10.1016\/j.lindif.2023.102274"},{"key":"e_1_3_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/879"},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Terry\u00a0K Koo and Mae\u00a0Y Li. 2016. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15 2 (2016) 155\u2013163.","DOI":"10.1016\/j.jcm.2016.02.012"},{"key":"e_1_3_3_1_28_2","volume-title":"Lehrkr\u00e4fteeinstellungsbedarf und -angebot in der Bundesrepublik Deutschland 2023 \u2013 2035: Zusammengefasste Modellrechnungen der L\u00e4nder","year":"2023","unstructured":"Kultusministerkonferenz. 2023. Lehrkr\u00e4fteeinstellungsbedarf und -angebot in der Bundesrepublik Deutschland 2023 \u2013 2035: Zusammengefasste Modellrechnungen der L\u00e4nder. Dokumentation 238. Sekretariat der St\u00e4ndigen Konferenz der Kultusminister der L\u00e4nder in der Bundesrepublik Deutschland, Berlin. Beschluss der Kultusministerkonferenz vom 08.12.2023."},{"key":"e_1_3_3_1_29_2","doi-asserted-by":"crossref","unstructured":"Gyeong-Geon Lee Ehsan Latif Xuansheng Wu Ninghao Liu and Xiaoming Zhai. 2024. Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence 6 (2024) 100213.","DOI":"10.1016\/j.caeai.2024.100213"},{"key":"e_1_3_3_1_30_2","unstructured":"Dogan\u00a0Gursoy Mesut\u00a0Cicek and Lu Lu. 2024. Adverse impacts of revealing the presence of \u201cArtificial Intelligence (AI)\u201d technology in product and service descriptions on purchase intentions: the mediating role of emotional trust and the moderating role of perceived risk. Journal of Hospitality Marketing & Management 0 0 (2024) 1\u201323."},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"crossref","unstructured":"Atsushi Mizumoto and Masaki Eguchi. 2023. Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics 2 2 (2023) 100050.","DOI":"10.1016\/j.rmal.2023.100050"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"crossref","unstructured":"Leo Morjaria Levi Burns Keyna Bracken Anthony\u00a0J Levinson Quang\u00a0N Ngo Mark Lee and Matthew Sibbald. 2024. Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program. International Medical Education 3 1 (2024) 32\u201343.","DOI":"10.3390\/ime3010004"},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"crossref","unstructured":"Nora M\u00fcller Till Utesch and Vera Busse. 2023. Qualit\u00e4t statt Quantit\u00e4t? Zum Zusammenhang von Schreibf\u00f6rderungs-und Feedbackpraktiken mit Textqualit\u00e4t unter Ber\u00fccksichtigung von migrationsbedingter Mehrsprachigkeit. Unt.wiss. Zeits. f. Lernforschung 51 2 (2023) 169\u2013198.","DOI":"10.1007\/s42010-023-00173-2"},{"key":"e_1_3_3_1_34_2","unstructured":"Sonia Alejandrina\u00a0Sotelo Mu\u00f1oz Giovanna\u00a0Guti\u00e9rrez Gayoso Alberto\u00a0Caceres Huambo Rogelio Domingo\u00a0Cahuana Tapia Jorge\u00a0Layme Incaluque Oscar Eduardo\u00a0Pongo Aguila Juan Cielo\u00a0Ram\u00edrez Cajamarca Jesus Enrique\u00a0Reyes Acevedo Herbert Victor\u00a0Huaranga Rivera and Jos\u00e9\u00a0Luis Arias-Gonz\u00e1les. 2023. Examining the impacts of ChatGPT on student motivation and engagement. Social Space 23 1 (2023) 1\u201327."},{"key":"e_1_3_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.3249\/webdoc-3971"},{"key":"e_1_3_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.bea-1.32"},{"key":"e_1_3_3_1_37_2","doi-asserted-by":"crossref","unstructured":"Tanya Nazaretsky Moriah Ariely Mutlu Cukurova and Giora Alexandron. 2022. Teachers\u2019 trust in AI-powered educational technology and a professional development program to improve it. British journal of educational technology 53 4 (2022) 914\u2013931.","DOI":"10.1111\/bjet.13232"},{"key":"e_1_3_3_1_38_2","first-page":"278","volume-title":"European Conference on Technology Enhanced Learning","author":"Nguyen Huy\u00a0A","year":"2023","unstructured":"Huy\u00a0A Nguyen, Hayden Stec, Xinying Hou, Sarah Di, and Bruce\u00a0M McLaren. 2023. Evaluating chatgpt\u2019s decimal skills and feedback generation in a digital learning game. In European Conference on Technology Enhanced Learning, Olga Viberg, Ioana Jivet, Pedro\u00a0J. Mu\u00f1oz-Merino, Maria Perifanou, and Tina Papathoma (Eds.). Springer Nature Switzerland, Cham, 278\u2013293."},{"key":"e_1_3_3_1_39_2","volume-title":"GPT-4 technical report","year":"2023","unstructured":"OpenAI. 2023. GPT-4 technical report. Technical Report. OpenAI."},{"key":"e_1_3_3_1_40_2","volume-title":"OpenAI o1 System Card","year":"2024","unstructured":"OpenAI. 2024. OpenAI o1 System Card. Technical Report. OpenAI."},{"key":"e_1_3_3_1_41_2","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022) 27730\u201327744."},{"key":"e_1_3_3_1_42_2","unstructured":"Ulrike Pad\u00f3 Yunus Eryilmaz and Larissa Kirschner. 2023. Short-Answer Grading for German: Addressing the Challenges. International Journal of Artificial Intelligence in Education (2023) 1\u201332."},{"key":"e_1_3_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613372.3614197"},{"key":"e_1_3_3_1_44_2","unstructured":"Sanna Pohlmann-Rother Edgar Schoreit and Anja K\u00fcrzinger. 2016. Schreibkompetenzen von Erstkl\u00e4sslern quantitativ-empirisch erfassen-Herausforderungen und Zugewinn eines analytisch-kriterialen Vorgehens gegen\u00fcber einer holistischen Bewertung. Journal for educational research online 8 2 (2016) 107\u2013135."},{"key":"e_1_3_3_1_45_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1 8 (2019) 9."},{"key":"e_1_3_3_1_46_2","doi-asserted-by":"crossref","unstructured":"Dadi Ramesh and Suresh\u00a0Kumar Sanampudi. 2022. An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review 55 3 (2022) 2495\u20132527.","DOI":"10.1007\/s10462-021-10068-2"},{"key":"e_1_3_3_1_47_2","first-page":"65","volume-title":"International conference on artificial intelligence in education technology","author":"Sawatzki J\u00f6rg","year":"2021","unstructured":"J\u00f6rg Sawatzki, Tim Schlippe, and Marian Benner-Wickner. 2021. Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. In International conference on artificial intelligence in education technology, Eric C.\u00a0K. Cheng, Rekha\u00a0B. Koul, Tianchong Wang, and Xinguo Yu (Eds.). Springer Nature Singapore, Singapore, 65\u201375."},{"key":"e_1_3_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.3278\/9783763972494"},{"key":"e_1_3_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-42682-7_73"},{"key":"e_1_3_3_1_50_2","unstructured":"Maja Stahl Leon Biermann Andreas Nehring and Henning Wachsmuth. 2024. Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation. arxiv:https:\/\/arXiv.org\/abs\/2404.15845\u00a0[cs.CL]"},{"key":"e_1_3_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-23204-7_39"},{"key":"e_1_3_3_1_52_2","volume-title":"Llama 2: Open foundation and fine-tuned chat models","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. Technical Report. GenAI, Meta."},{"key":"e_1_3_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.535"},{"key":"e_1_3_3_1_54_2","doi-asserted-by":"crossref","unstructured":"Hester van Herk Ype\u00a0H. Poortinga and Theo M.\u00a0M. Verhallen. 2004. Response Styles in Rating Scales: Evidence of Method Bias in Data From Six EU Countries. Journal of Cross-Cultural Psychology 35 3 (2004) 346\u2013360.","DOI":"10.1177\/0022022104264126"},{"key":"e_1_3_3_1_55_2","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Fei Xia Ed Chi Quoc\u00a0V Le Denny Zhou et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022) 24824\u201324837."},{"key":"e_1_3_3_1_56_2","doi-asserted-by":"crossref","unstructured":"Jin Xue Xiaoyi Tang and Liyan Zheng. 2021. A hierarchical BERT-based transfer learning approach for multi-dimensional essay scoring. Ieee Access 9 (2021) 125403\u2013125415.","DOI":"10.1109\/ACCESS.2021.3110683"},{"key":"e_1_3_3_1_57_2","doi-asserted-by":"crossref","unstructured":"Na Zhai and Xiaomei Ma. 2022. Automated writing evaluation (AWE) feedback: a systematic investigation of college students\u2019 acceptance. Computer Assisted Language Learning 35 9 (2022) 2817\u20132842.","DOI":"10.1080\/09588221.2021.1897019"}],"event":{"name":"LAK '25: The 15th International Learning Analytics and Knowledge Conference","location":"Dublin Ireland","acronym":"LAK 2025"},"container-title":["Proceedings of the 15th International Learning Analytics and Knowledge Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3706468.3706527","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3706468.3706527","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:56:51Z","timestamp":1750298211000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3706468.3706527"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,3]]},"references-count":56,"alternative-id":["10.1145\/3706468.3706527","10.1145\/3706468"],"URL":"https:\/\/doi.org\/10.1145\/3706468.3706527","relation":{},"subject":[],"published":{"date-parts":[[2025,3,3]]},"assertion":[{"value":"2025-03-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}