{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T10:36:28Z","timestamp":1777113388198,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","funder":[{"name":"Leibniz Association under the Leibniz Competition","award":["T163\/2024"],"award-info":[{"award-number":["T163\/2024"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,4,27]]},"DOI":"10.1145\/3785022.3785031","type":"proceedings-article","created":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T09:39:01Z","timestamp":1777109941000},"page":"75-84","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Automatic Short Answer Grading with LLMs: From Memorization to Reasoning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-2568-4556","authenticated-orcid":false,"given":"Longwei","family":"Cong","sequence":"first","affiliation":[{"name":"DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2333-9705","authenticated-orcid":false,"given":"Leon","family":"Hammerla","sequence":"additional","affiliation":[{"name":"Goethe University Frankfurt, Frankfurt am Main, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5461-6383","authenticated-orcid":false,"given":"Sonja","family":"Hahn","sequence":"additional","affiliation":[{"name":"DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5598-9547","authenticated-orcid":false,"given":"Sebastian","family":"Gombert","sequence":"additional","affiliation":[{"name":"DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8407-5314","authenticated-orcid":false,"given":"Hendrik","family":"Drachsler","sequence":"additional","affiliation":[{"name":"DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany and Faculty of Computer Science, Goethe University Frankfurt, Frankfurt am Main, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0412-169X","authenticated-orcid":false,"given":"Ulf","family":"Kroehne","sequence":"additional","affiliation":[{"name":"Chemnitz University of Technology, Chemnitz, Germany and DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany"}]}],"member":"320","published-online":{"date-parts":[[2026,4,26]]},"reference":[{"key":"e_1_3_3_2_2_2","unstructured":"OpenAI (2024). 2024. GPT-4o System Card. arxiv:https:\/\/arXiv.org\/abs\/2410.21276\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2410.21276"},{"key":"e_1_3_3_2_3_2","first-page":"304","volume-title":"International Conference on Artificial Intelligence in Education","author":"Aggarwal Dishank","year":"2025","unstructured":"Dishank Aggarwal, Pritam Sil, Bhaskaran Raman, and Pushpak Bhattacharyya. 2025. \u201cI understand why I got this grade\u201d: Automatic Short Answer Grading (ASAG) with Feedback. In International Conference on Artificial Intelligence in Education. Springer, 304\u2013318."},{"key":"e_1_3_3_2_4_2","doi-asserted-by":"crossref","unstructured":"Nico Andersen Fabian Zehner and Frank Goldhammer. 2023. Semi-automatic coding of open-ended text responses in large-scale assessments. Journal of Computer Assisted Learning 39 3 (2023) 841\u2013854.","DOI":"10.1111\/jcal.12717"},{"key":"e_1_3_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Xiaoyu Bai and Manfred Stede. 2023. A survey of current machine learning approaches to student free-text evaluation for intelligent tutoring. International Journal of Artificial Intelligence in Education 33 4 (2023) 992\u20131030.","DOI":"10.1007\/s40593-022-00323-0"},{"key":"e_1_3_3_2_6_2","unstructured":"Nicolas Boizard Hippolyte Gisserot-Boukhlef Duarte\u00a0M Alves Andr\u00e9 Martins Ayoub Hammal Caio Corro C\u00e9line Hudelot Emmanuel Malherbe Etienne Malaboeuf Fanny Jourdan et\u00a0al. 2025. EuroBERT: scaling multilingual encoders for European languages. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.05500 (2025)."},{"key":"e_1_3_3_2_7_2","first-page":"75","volume-title":"European Conference on Technology Enhanced Learning","author":"Borgards Lena","year":"2024","unstructured":"Lena Borgards, Onur Karademir, Sebastian Strau\u00df, Daniele Di\u00a0Mitri, Marcus Kubsch, Markus Brobeil, Adrian Grimm, Sebastian Gombert, Knut Neumann, Hendrik Drachsler, et\u00a0al. 2024. Achieving Tailored Feedback by Means of a Teacher Dashboard? Insights into Teachers\u2019 Feedback Practices. In European Conference on Technology Enhanced Learning. Springer, 75\u201380."},{"key":"e_1_3_3_2_8_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared\u00a0D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877\u20131901."},{"key":"e_1_3_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Steven Burrows Iryna Gurevych and Benno Stein. 2015. The eras and trends of automatic short answer grading. International journal of artificial intelligence in education 25 1 (2015) 60\u2013117.","DOI":"10.1007\/s40593-014-0026-8"},{"key":"e_1_3_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-52240-7_8"},{"key":"e_1_3_3_2_11_2","first-page":"309","volume-title":"Proceedings of the 19th workshop on innovative use of nlp for building educational applications (bea 2024)","author":"Chamieh Imran","year":"2024","unstructured":"Imran Chamieh, Torsten Zesch, and Klaus Giebermann. 2024. Llms in short answer scoring: Limitations and promise of zero-shot and few-shot approaches. In Proceedings of the 19th workshop on innovative use of nlp for building educational applications (bea 2024). 309\u2013315."},{"key":"e_1_3_3_2_12_2","unstructured":"Branden Chan Stefan Schweter and Timo M\u00f6ller. 2020. German\u2019s next language model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2010.10906 (2020)."},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i21.30363"},{"key":"e_1_3_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.15870304"},{"key":"e_1_3_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-52240-7_14"},{"key":"e_1_3_3_2_16_2","first-page":"4171","volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171\u20134186."},{"key":"e_1_3_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.25656\/01:26787"},{"key":"e_1_3_3_2_18_2","unstructured":"Myroslava\u00a0O Dzikovska Rodney\u00a0D Nielsen Chris Brew Claudia Leacock Danilo Giampiccolo Luisa Bentivogli Peter Clark Ido Dagan and Hoa\u00a0T Dang. 2013. Semeval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. (2013)."},{"key":"e_1_3_3_2_19_2","first-page":"120","volume-title":"European Conference on Technology Enhanced Learning","author":"Falhs Ann-Christin","year":"2025","unstructured":"Ann-Christin Falhs, Conrad Borchers, Vanessa Echeverria, Kexin Yang, Nikol Rummel, and Vincent Aleven. 2025. How Expertise Levels Shape Preferences and Reflection Needs: Towards AI Reflection Systems for Teacher Empowerment. In European Conference on Technology Enhanced Learning. Springer, 120\u2013125."},{"key":"e_1_3_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706468.3706481"},{"key":"e_1_3_3_2_21_2","volume-title":"The measurement of interrater agreement","author":"Fleiss Joseph\u00a0L","year":"1981","unstructured":"Joseph\u00a0L Fleiss, Bruce Levin, Myunghee\u00a0Cho Paik, et\u00a0al. 1981. The measurement of interrater agreement. Citeseer."},{"key":"e_1_3_3_2_22_2","first-page":"44","volume-title":"International Conference on Artificial Intelligence in Education","author":"Frohn Scott","year":"2025","unstructured":"Scott Frohn, Tyler Burleigh, and Jing Chen. 2025. Automated Scoring of Short Answer Questions with Large Language Models: Impacts of Model, Item, and Rubric Design. In International Conference on Artificial Intelligence in Education. Springer, 44\u201351."},{"key":"e_1_3_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Zhengjie Gao Ao Feng Xinyu Song and Xi Wu. 2019. Target-dependent sentiment classification with BERT. Ieee Access 7 (2019) 154290\u2013154299.","DOI":"10.1109\/ACCESS.2019.2946594"},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"crossref","unstructured":"Sebastian Gombert Daniele Di\u00a0Mitri Onur Karademir Marcus Kubsch Hannah Kolbe Simon Tautz Adrian Grimm Isabell Bohm Knut Neumann and Hendrik Drachsler. 2023. Coding energy knowledge in constructed responses with explainable NLP models. Journal of Computer Assisted Learning 39 3 (2023) 767\u2013786.","DOI":"10.1111\/jcal.12767"},{"key":"e_1_3_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.bea-1.92"},{"key":"e_1_3_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Yu Gu Robert Tinn Hao Cheng Michael Lucas Naoto Usuyama Xiaodong Liu Tristan Naumann Jianfeng Gao and Hoifung Poon. 2021. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 3 1 (2021) 1\u201323.","DOI":"10.1145\/3458754"},{"key":"e_1_3_3_2_27_2","unstructured":"Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi et\u00a0al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.12948 (2025)."},{"key":"e_1_3_3_2_28_2","unstructured":"Stefan Haller Adina Aldea Christin Seifert and Nicola Strisciuglio. 2022. Survey on automated short answer grading with deep learning: from word embeddings to transformers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.03503 (2022)."},{"key":"e_1_3_3_2_29_2","unstructured":"Helia Hashemi Jason Eisner Corby Rosset Benjamin Van\u00a0Durme and Chris Kedzie. 2024. LLM-rubric: A multidimensional calibrated approach to automated evaluation of natural language texts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.00274 (2024)."},{"key":"e_1_3_3_2_30_2","unstructured":"Andreas Helmke and RS J\u00e4ger. 2003. Vergleichsarbeiten (VERA): eine Standortbestimmung zur Sicherung schulischer Kompetenzen. SchulVerwaltung Hessen Rheinland-Pfalz Saarland (in preparation) (2003)."},{"key":"e_1_3_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3657604.3664693"},{"key":"e_1_3_3_2_32_2","unstructured":"Hewlett Foundation. 2012. Automated Student Assessment Prize (ASAP) \u2013 Short Answer Scoring Dataset. https:\/\/www.kaggle.com\/c\/asap-sas. Released as part of the Kaggle competition on short answer scoring."},{"key":"e_1_3_3_2_33_2","unstructured":"Edward\u00a0J Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang Weizhu Chen et\u00a0al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1 2 (2022) 3."},{"key":"e_1_3_3_2_34_2","doi-asserted-by":"publisher","unstructured":"Rashmi Khazanchi Daniele Di\u00a0Mitri and Hendrik Drachsler. 2023. Measuring Efficacy of ALEKS as a Supportive Instructional Tool in K-12 Math Classroom with Underachieving Students. Journal of Computers in Mathematics and Science Teaching 42 (01 2023) 155\u2013176. 10.70725\/204333qjmwhs","DOI":"10.70725\/204333qjmwhs"},{"key":"e_1_3_3_2_35_2","doi-asserted-by":"publisher","unstructured":"Rashmi Khazanchi Daniele Di\u00a0Mitri and Hendrik Drachsler. 2024. The Effect of AI\u2010Based Systems on Mathematics Achievement in Rural Context: A Quantitative Study. Journal of Computer Assisted Learning 41 (11 2024) n\/a\u2013n\/a. 10.1111\/jcal.13098","DOI":"10.1111\/jcal.13098"},{"key":"e_1_3_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.36198\/9783838553085"},{"key":"e_1_3_3_2_37_2","first-page":"9662","volume-title":"Proceedings of the AAAI conference on artificial intelligence","volume":"33","author":"Kumar Yaman","year":"2019","unstructured":"Yaman Kumar, Swati Aggarwal, Debanjan Mahata, Rajiv\u00a0Ratn Shah, Ponnurangam Kumaraguru, and Roger Zimmermann. 2019. Get it scored using autosas\u2014an automated system for scoring short answers. In Proceedings of the AAAI conference on artificial intelligence, Vol.\u00a033. 9662\u20139669."},{"key":"e_1_3_3_2_38_2","doi-asserted-by":"publisher","unstructured":"Zhaohui Li Chengning Zhang Yumi Jin Xuesong Cang Sadhana Puntambekar and Rebecca\u00a0J. Passonneau. 2023. Learning When to Defer to Humans for Short Answer Grading. International Conference on Artificial Intelligence in Education (2023). 10.1007\/978-3-031-36272-9_34","DOI":"10.1007\/978-3-031-36272-9_34"},{"key":"e_1_3_3_2_39_2","unstructured":"Samuel\u00a0A Livingston. 2009. Constructed-Response Test Questions: Why We Use Them; How We Score Them. R&D Connections. Number 11. Educational Testing Service (2009)."},{"key":"e_1_3_3_2_40_2","first-page":"238","volume-title":"FLAIRS","author":"Magooda Ahmed\u00a0Ezzat","year":"2016","unstructured":"Ahmed\u00a0Ezzat Magooda, Mohamed\u00a0A Zahran, Mohsen\u00a0A Rashwan, Hazem\u00a0M Raafat, and Magda\u00a0B Fayek. 2016. Vector Based Techniques for Short Answer Grading.. In FLAIRS. 238\u2013243."},{"key":"e_1_3_3_2_41_2","unstructured":"Shervin Minaee Tomas Mikolov Narjes Nikzad Meysam Chenaghlu Richard Socher Xavier Amatriain and Jianfeng Gao. 2024. Large language models: A survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.06196 (2024)."},{"key":"e_1_3_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002568"},{"key":"e_1_3_3_2_43_2","doi-asserted-by":"crossref","unstructured":"Humza Naveed Asad\u00a0Ullah Khan Shi Qiu Muhammad Saqib Saeed Anwar Muhammad Usman Naveed Akhtar Nick Barnes and Ajmal Mian. 2025. A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology 16 5 (2025) 1\u201372.","DOI":"10.1145\/3744746"},{"key":"e_1_3_3_2_44_2","unstructured":"OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card. arxiv:https:\/\/arXiv.org\/abs\/2508.10925\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2508.10925"},{"key":"e_1_3_3_2_45_2","unstructured":"Fabian Pedregosa Ga\u00ebl Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et\u00a0al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011) 2825\u20132830."},{"key":"e_1_3_3_2_46_2","doi-asserted-by":"publisher","unstructured":"Susanne Prediger and Lena Wessel. 2013. Fostering German-language learners\u2019 constructions of meanings for fractions\u2014design and effects of a language- and mathematics-integrated intervention. Mathematics Education Research Journal 25 3 (Sept. 2013) 435\u2013456. 10.1007\/s13394-013-0079-2","DOI":"10.1007\/s13394-013-0079-2"},{"key":"e_1_3_3_2_47_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1 8 (2019) 9."},{"key":"e_1_3_3_2_48_2","unstructured":"Pranab Sahoo Ayush\u00a0Kumar Singh Sriparna Saha Vinija Jain Samrat Mondal and Aman Chadha. 2024. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.07927 (2024)."},{"key":"e_1_3_3_2_49_2","unstructured":"Zhihong Shao Peiyi Wang Qihao Zhu Runxin Xu Junxiao Song Xiao Bi Haowei Zhang Mingchuan Zhang YK Li Yang Wu et\u00a0al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.03300 (2024)."},{"key":"e_1_3_3_2_50_2","volume-title":"NeurIPS 2021 Math AI for Education Workshop","author":"Shen Jia\u00a0Tracy","year":"2021","unstructured":"Jia\u00a0Tracy Shen, Michiharu Yamashita, Ethan Prihar, Neil Heffernan, Xintao Wu, Ben Graff, and Dongwon Lee. 2021. MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education. In NeurIPS 2021 Math AI for Education Workshop."},{"key":"e_1_3_3_2_51_2","doi-asserted-by":"publisher","unstructured":"Erverson B. G.\u00a0de Sousa Bruno Alexandre Rafael Ferreira\u00a0Mello Taciana Pontual\u00a0Falc\u00e3o Boban Vesin and Dragan Ga\u0161evi\u0107. 2021. Applications of Learning Analytics in High Schools: A Systematic Literature Review. Frontiers in Artificial Intelligence Volume 4 - 2021 (2021). 10.3389\/frai.2021.737891","DOI":"10.3389\/frai.2021.737891"},{"key":"e_1_3_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1628"},{"key":"e_1_3_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3698205.3729551"},{"key":"e_1_3_3_2_54_2","doi-asserted-by":"crossref","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Fei Xia Ed Chi Quoc\u00a0V Le Denny Zhou et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022) 24824\u201324837.","DOI":"10.52202\/068431-1800"},{"key":"e_1_3_3_2_55_2","unstructured":"Zhilin Yang Zihang Dai Yiming Yang Jaime Carbonell Russ\u00a0R Salakhutdinov and Quoc\u00a0V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.bea-1.47"},{"key":"e_1_3_3_2_57_2","unstructured":"Zhuosheng Zhang Aston Zhang Mu Li and Alex Smola. 2022. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.03493 (2022)."},{"key":"e_1_3_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData52589.2021.9671697"}],"event":{"name":"LAK 2026: LAK26: 16th International Learning Analytics and Knowledge Conference","location":"Bergen Norway","acronym":"LAK 2026"},"container-title":["Proceedings of the LAK26: 16th International Learning Analytics and Knowledge Conference"],"original-title":[],"deposited":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T09:47:39Z","timestamp":1777110459000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3785022.3785031"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,26]]},"references-count":57,"alternative-id":["10.1145\/3785022.3785031","10.1145\/3785022"],"URL":"https:\/\/doi.org\/10.1145\/3785022.3785031","relation":{},"subject":[],"published":{"date-parts":[[2026,4,26]]},"assertion":[{"value":"2026-04-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}