{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T05:10:48Z","timestamp":1771045848883,"version":"3.50.1"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Interact. Intell. Syst."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>\n                    Code-reading ability has traditionally been under-emphasized in assessments as it is difficult to assess at scale. Prior research has shown that code-reading and code-writing are closely related skills; thus being able to assess and train code reading skills may be necessary for student learning. One way to assess code-reading ability is using Explain in Plain English (EiPE) questions, which ask students to describe what a piece of code does with natural language. Previous research deployed a binary (correct\/incorrect) autograder using bigram models that performed comparably with human teaching assistants on student responses. With a dataset of 3,064 student responses from 17 EiPE questions, we investigated multiple autograders for EiPE questions. We evaluated methods as simple as logistic regression trained on bigram features, to more complicated Support Vector Machines (SVMs) trained on embeddings from Large Language Models (LLMs) to GPT-4. We found multiple useful autograders, most with accuracies in the\n                    <jats:inline-formula content-type=\"math\/tex\">\n                      <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(86\\!\\!-\\!\\!88\\%\\)<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    range, with different advantages. SVMs trained on LLM embeddings had the highest accuracy; few-shot chat completion with GPT-4 required minimal human effort; pipelines with multiple autograders for specific dimensions (what we call 3D autograders) can provide fine-grained feedback; and code generation with GPT-4 to leverage automatic code testing as a grading mechanism in exchange for slightly more lenient grading standards. While piloting these autograders in a non-major introductory Python course, students had largely similar views of all autograders, although they more often found the GPT-based grader and code-generation graders more helpful and liked the code-generation grader the most.\n                  <\/jats:p>","DOI":"10.1145\/3774752","type":"journal-article","created":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T13:28:42Z","timestamp":1762435722000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Evaluating AI Models for Autograding Explain in Plain English Questions: Challenges and Considerations"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4730-447X","authenticated-orcid":false,"given":"Max","family":"Fowler","sequence":"first","affiliation":[{"name":"Computer Science, University of Illinois, Urbana, Illinois, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4821-162X","authenticated-orcid":false,"given":"Chinedu","family":"Emeka","sequence":"additional","affiliation":[{"name":"Computer Science, University of Illinois, Urbana, Illinois, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9033-1281","authenticated-orcid":false,"given":"Binglin","family":"Chen","sequence":"additional","affiliation":[{"name":"Computer Science, University of Illinois, Urbana, Illinois, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6572-4347","authenticated-orcid":false,"given":"David H.","family":"Smith IV","sequence":"additional","affiliation":[{"name":"Computer Science, Virginia Tech, Blacksburg, Virginia, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7605-0050","authenticated-orcid":false,"given":"Matthew","family":"West","sequence":"additional","affiliation":[{"name":"Mechanical Science and Engineering, University of Illinois, Urbana, Illinois, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4601-4398","authenticated-orcid":false,"given":"Craig","family":"Zilles","sequence":"additional","affiliation":[{"name":"Computer Science, University of Illinois, Urbana, Illinois, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,12,10]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"OpenAI. 2024. How can I use the Chat Completion (ChatGPT) API? Retrieved from https:\/\/help.openai.com\/en\/articles\/7232945-how-can-i-use-the-chat-completion-chatgpt-api"},{"key":"e_1_3_2_3_2","first-page":"67","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Amoozadeh Matin","year":"2024","unstructured":"Matin Amoozadeh, David Daniels, Daye Nam, Aayush Kumar, Stella Chen, Michael Hilton, Sruti Srinivasa Ragavan, and Mohammad Amin Alipour. 2024. Trust in generative AI among students: An exploratory study. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 67\u201373."},{"key":"e_1_3_2_4_2","volume-title":"Evaluating the Quality of Learning: The SOLO Taxonomy (Structure of the Observed Learning Outcome)","author":"Biggs John B.","year":"2014","unstructured":"John B. Biggs and Kevin F. Collis. 2014. Evaluating the Quality of Learning: The SOLO Taxonomy (Structure of the Observed Learning Outcome). Academic Press."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-84060-0_5"},{"key":"e_1_3_2_6_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg et al. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712. Retrieved from https:\/\/arxiv.org\/abs\/2303.12712"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40593-014-0026-8"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-52240-7_8"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3328778.3366879"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Jianlv Chen Shitao Xiao Peitian Zhang Kun Luo Defu Lian and Zheng Liu. 2024. BGE M3-embedding: Multi-lingual multi-functionality multi-granularity text embeddings through self-knowledge distillation. arXiv:2402.03216. Retrieved from https:\/\/arxiv.org\/abs\/2402.03216","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"e_1_3_2_11_2","first-page":"61","volume-title":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education","volume":"1","author":"Cipriano Bruno Pereira","year":"2023","unstructured":"Bruno Pereira Cipriano and Pedro Alves. 2023. GPT-3 vs object oriented programming assignments: An experience report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education, Vol. 1, 61\u201367."},{"key":"e_1_3_2_12_2","first-page":"1548","volume-title":"Proceedings of the 6th International Conference (RIAO \u201900)","author":"Cowie James R.","year":"2000","unstructured":"James R. Cowie, Yevgeny Ludovik, Hugo Molina-Salgado, Sergei Nirenburg, and Svetlana Sheremetyeva. 2000. Automatic question answering. In Proceedings of the 6th International Conference (RIAO \u201900), 1548\u20131557."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-023-01892-9"},{"key":"e_1_3_2_14_2","first-page":"1136","volume-title":"Proceedings of the 54th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Denny Paul","year":"2023","unstructured":"Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Vol. 1, 1136\u20131142."},{"key":"e_1_3_2_15_2","first-page":"296","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education (SIGCSE \u201924)","volume":"1","author":"Denny Paul","year":"2024","unstructured":"Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt problems: A new programming exercise for the generative AI era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education (SIGCSE \u201924), Vol. 1, ACM, New York, NY, 296\u2013302."},{"key":"e_1_3_2_16_2","first-page":"283","volume-title":"Proceedings of the 2024 on Innovation and Technology in Computer Science Education","volume":"1","author":"Denny Paul","year":"2024","unstructured":"Paul Denny, David H. Smith IV, Max Fowler, James Prather, Brett A. Becker, and Juho Leinonen. 2024. Explaining code with a purpose: An integrated approach for developing code comprehension and prompting skills. In Proceedings of the 2024 on Innovation and Technology in Computer Science Education, Vol. 1, 283\u2013289."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706468.3706481"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511861.3511863"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616243"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3408877.3432539"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3446871.3469738"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1080\/08993408.2022.2079866"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-03928-8_31"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1007\/978-3-031-64299-9_20","volume-title":"Artificial Intelligence in Education","author":"Ghimire Aashish","year":"2024","unstructured":"Aashish Ghimire and John Edwards. 2024. Coding with AI: How Are Tools Like ChatGPT Being Used by Students in Foundational Programming Courses. In Artificial Intelligence in Education, Andrew M. Olney, Irene-Angelica Chounta, Zitao Liu, Olga C. Santos, and Ig Ibert Bittencourt (Eds.), Springer Nature Switzerland, Cham, 259\u2013267."},{"key":"e_1_3_2_25_2","unstructured":"Stefan Haller Adina Aldea Christin Seifert and Nicola Strisciuglio. 2022. Survey on automated short answer grading with deep learning: From word embeddings to transformers. arXiv:2204.03503. Retrieved from https:\/\/arxiv.org\/abs\/2204.03503"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445424"},{"key":"e_1_3_2_27_2","first-page":"569","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Ishizue Ryosuke","year":"2024","unstructured":"Ryosuke Ishizue, Kazunori Sakamoto, Hironori Washizaki, and Yoshiaki Fukazawa. 2024. Improved program repair methods using refactoring with GPT models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 569\u2013575."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR66628.2025.00081"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626252.3630897"},{"key":"e_1_3_2_30_2","first-page":"4171","volume-title":"Proceedings of NAACL-HLT","author":"Chang Kenton Jacob Devlin Ming-Wei","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, 4171\u20134186."},{"key":"e_1_3_2_31_2","first-page":"88","volume-title":"Proceedings of the 2024 on ACM Virtual Global Computing Education Conference","volume":"1","author":"Kerslake Chris","year":"2024","unstructured":"Chris Kerslake, Paul Denny, David H. Smith, IV, James Prather, Juho Leinonen, Andrew Luxton-Reilly, and Stephen MacNeil. 2024. Integrating natural language prompting tasks in introductory programming courses. In Proceedings of the 2024 on ACM Virtual Global Computing Education Conference, Vol. 1, 88\u201394."},{"key":"e_1_3_2_32_2","unstructured":"Natalie Kiesler and Daniel Schiffner. 2023. Large language models in introductory programming education: ChatGPT\u2019s performance and implications for assessments. arXiv:2308.08572. Retrieved from https:\/\/arxiv.org\/abs\/2308.08572"},{"key":"e_1_3_2_33_2","first-page":"2046","volume-title":"Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI \u201917)","author":"Kumar Sachin","year":"2017","unstructured":"Sachin Kumar, Soumen Chakrabarti, and Shourya Roy. 2017. Earth mover\u2019s distance pooling over Siamese LSTMs for automatic short answer grading. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI \u201917), 2046\u20132052."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1025779619903"},{"key":"e_1_3_2_35_2","first-page":"124","volume-title":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education","volume":"1","author":"Leinonen Juho","year":"2023","unstructured":"Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. 2023. Comparing code explanations created by students and large language models. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education, Vol. 1, 124\u2013130."},{"key":"e_1_3_2_36_2","first-page":"159","volume-title":"Proceedings of the 2023 ACM Conference on International Computing Education Research (ICER \u201923)","volume":"1","author":"Li Tiffany Wenting","year":"2023","unstructured":"Tiffany Wenting Li, Silas Hsu, Max Fowler, Zhilin Zhang, Craig Zilles, and Karrie Karahalios. 2023. Am I wrong, or is the autograder wrong? Effects of AI grading mistakes on learning. In Proceedings of the 2023 ACM Conference on International Computing Education Research (ICER \u201923), Vol. 1, 159\u2013176."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1595496.1562930"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1140123.1140157"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626252.3630938"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643674"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1404520.1404531"},{"key":"e_1_3_2_42_2","volume-title":"Automatic Short Answer Grading Using Deep Learning","author":"Jinzhu Luo","year":"2021","unstructured":"Luo Jinzhu. 2021. Automatic Short Answer Grading Using Deep Learning. Ph.D. Dissertation. Illinois State University."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3545945.3569785"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649217.3653621"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/572133.572137"},{"key":"e_1_3_2_46_2","first-page":"276","volume-title":"Biochema Medica","volume":"22","author":"McHugh Mary L.","year":"2012","unstructured":"Mary L. McHugh. 2012. Interrater reliability: The kappa statistic. Biochema Medica 22, 3 (2012), 276\u2013282."},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1145\/2157136.2157249","volume-title":"Proceedings of the 43rd ACM Technical Symposium on Computer Science Education","author":"Murphy Laurie","year":"2012","unstructured":"Laurie Murphy, Ren\u00e9e McCauley, and Sue Fitzgerald. 2012. \u201cExplain in plain English\u2019 questions: Implications for teaching. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education, 385\u2013390."},{"key":"e_1_3_2_48_2","first-page":"958","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Nguyen Ha","year":"2024","unstructured":"Ha Nguyen and Vicki Allan. 2024. Using GPT-4 to provide tiered, formative code feedback. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 958\u2013964."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642706"},{"key":"e_1_3_2_50_2","first-page":"1063","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Poulsen Seth","year":"2024","unstructured":"Seth Poulsen, Sami Sarsa, James Prather, Juho Leinonen, Brett A. Becker, Arto Hellas, Paul Denny, and Brent N. Reeves. 2024. Solving proof block problems using large language models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 1063\u20131069."},{"key":"e_1_3_2_51_2","first-page":"1583","volume-title":"Proceedings of the 56th ACM Technical Symposium on Computer Science Education","volume":"2","author":"Newar Dip Kiran Pradhan","year":"2025","unstructured":"Dip Kiran Pradhan Newar, Max Fowler, David H. Smith, IV, and Seth Poulsen. 2025. Mining hierarchies with conviction: Constructing the CS1 skill hierarchy with pairwise comparisons over skill distributions. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education, Vol. 2, 1583\u20131584."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3623762.3633499"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649405.3659534"},{"issue":"8","key":"e_1_3_2_54_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_55_2","first-page":"938","volume-title":"Proceedings of the 56th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Raihan Nishat","year":"2025","unstructured":"Nishat Raihan, Mohammed Latif Siddiq, Joanna C. S. Santos, and Marcos Zampieri. 2025. Large language models in computer science education: A systematic literature review. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education, Vol. 1, 938\u2013944."},{"key":"e_1_3_2_56_2","first-page":"299","volume-title":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education","volume":"1","author":"Reeves Brent","year":"2023","unstructured":"Brent Reeves, Sami Sarsa, James Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett Powell, and Juho Leinonen. 2023. Evaluating the performance of code generation models for solving Parsons problems with small prompt variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education, Vol. 1, 299\u2013305."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.caeai.2025.100428"},{"key":"e_1_3_2_59_2","first-page":"1147","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Rogers Michael P.","year":"2024","unstructured":"Michael P. Rogers, Hannah Miller Hillberg, and Christopher L. Groves. 2024. Attitudes towards the use (and misuse) of ChatGPT: A preliminary study. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 1147\u20131153."},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1007\/978-3-319-93843-1_37","volume-title":"Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, June 27\u201330, 2018, Proceedings, Part I 19","author":"Saha Swarnadeep","year":"2018","unstructured":"Swarnadeep Saha, Tejas I. Dhamecha, Smit Marvaniya, Renuka Sindhgatta, and Bikram Sengupta. 2018. Sentence level or token level features for automatic short answer grading?: Use both. In Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, June 27\u201330, 2018, Proceedings, Part I 19. Carolyn Penstein Ros\u00e9, Roberto Mart\u00ednez-Maldonado, H. Ulrich Hoppe, Rose Luckin, Manolis Mavrikis, Kaska Porayska-Pomsta, Bruce McLaren, and Benedict du Boulay (Eds.), Springer, 503\u2013517."},{"issue":"1","key":"e_1_3_2_61_2","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1109\/TLT.2019.2897997","article-title":"Feature engineering and ensemble-based approach for improving automatic short-answer grading performance","volume":"13","author":"Sahu Archana","year":"2019","unstructured":"Archana Sahu and Plaban Kumar Bhowmick. 2019. Feature engineering and ensemble-based approach for improving automatic short-answer grading performance. IEEE Transactions on Learning Technologies 13, 1 (2019), 77\u201390.","journal-title":"IEEE Transactions on Learning Technologies"},{"key":"e_1_3_2_62_2","first-page":"27","volume-title":"Proceedings of the 2022 ACM Conference on International Computing Education Research","volume":"1","author":"Sarsa Sami","year":"2022","unstructured":"Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research, Vol. 1, ACM, New York, NY, 27\u201343."},{"key":"e_1_3_2_63_2","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1007\/978-3-031-64299-9_12","volume-title":"Artificial Intelligence in Education","author":"Scaria Nicy","year":"2024","unstructured":"Nicy Scaria, Suma Dharani Chenna, and Deepak Subramani. 2024. Automated educational question generation at different bloom\u2019s skill levels using large language models: Strategies and evaluation. In Artificial Intelligence in Education, Andrew M. Olney, Irene-Angelica Chounta, Zitao Liu, Olga C. Santos, and Ig Ibert Bittencourt (Eds.). Springer Nature Switzerland, Cham, 165\u2013179."},{"key":"e_1_3_2_64_2","first-page":"1223","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Sheard Judy","year":"2024","unstructured":"Judy Sheard, Paul Denny, Arto Hellas, Juho Leinonen, Lauri, and Malmi, Simon. 2024. Instructor perceptions of AI code generation Tools-A Multi-Institutional interview study. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 1223\u20131229."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/3657604.3662039","volume-title":"Proceedings of the 11th ACM Conference on Learning@ Scale","author":"Smith David H.","year":"2024","unstructured":"David H. Smith, IV, Paul Denny, and Max Fowler. 2024. Prompting for comprehension: Exploring the intersection of explain in plain English questions and prompt writing. In Proceedings of the 11th ACM Conference on Learning@ Scale, 39\u201350."},{"key":"e_1_3_2_66_2","first-page":"37","volume-title":"Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education","volume":"1","author":"Smith David H.","year":"2025","unstructured":"David H. Smith, IV, Max Fowler, Paul Denny, and Craig Zilles. 2025. Counting the trees in the Forest: Evaluating prompt segmentation for classifying code comprehension level. In Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education, Vol. 1, 37\u201343."},{"key":"e_1_3_2_67_2","first-page":"171","volume-title":"Proceedings of the 2024 on Innovation and Technology in Computer Science Education","volume":"1","author":"Smith David H.","year":"2024","unstructured":"David H. Smith, IV and Craig Zilles. 2024. Code generation based grading: Evaluating an auto-grading mechanism for \u201cexplain-in-Plain-English\u201d questions. In Proceedings of the 2024 on Innovation and Technology in Computer Science Education, Vol. 1, 171\u2013177."},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.7275\/96jp-xz07"},{"key":"e_1_3_2_69_2","first-page":"629","volume-title":"Proceedings of the 2005 Conference on Artificial Intelligence in Education: Supporting Learning through Intelligent and Socially Informed Technology","author":"Sukkarieh Jana Z.","year":"2005","unstructured":"Jana Z. Sukkarieh and Stephen G. Pulman. 2005. Information extraction and machine learning: Auto-marking short free text responses to science questions. In Proceedings of the 2005 Conference on Artificial Intelligence in Education: Supporting Learning through Intelligent and Socially Informed Technology, 629\u2013637."},{"key":"e_1_3_2_70_2","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1007\/978-3-030-23204-7_39","volume-title":"Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, June 25\u201329, 2019, Proceedings, Part I 20","author":"Sung Chul","year":"2019","unstructured":"Chul Sung, Tejas Indulal Dhamecha, and Nirmal Mukhi. 2019. Improving short answer grading using transformer-based pre-training. In Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, June 25\u201329, 2019, Proceedings, Part I 20. Seiji Isotani, Eva Mill\u00e1n, Amy Ogan, Peter Hastings, Bruce McLaren, and Rose Luckin (Eds.), Springer, 469\u2013481."},{"key":"e_1_3_2_71_2","first-page":"1314","volume-title":"Proceedings of the 55th ACM Technical Symposium on Computer Science Education","volume":"1","author":"Taylor Andrew","year":"2024","unstructured":"Andrew Taylor, Alexandra Vassar, Jake Renzella, and Hammond Pearce. 2024. DCC\u2013help: Transforming the role of the compiler by generating context-aware error explanations with large language models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, Vol. 1, 1314\u20131320."},{"key":"e_1_3_2_72_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023. Llama: Open and efficient foundation language models. aXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_73_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_2_74_2","first-page":"334","volume-title":"Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6\u201310, 2020, Proceedings, Part II 21","author":"Uto Masaki","year":"2020","unstructured":"Masaki Uto and Yuto Uchida. 2020. Automated short-answer grading using deep neural networks and item response theory. In Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6\u201310, 2020, Proceedings, Part II 21. Ig Ibert Bittencourt, Mutlu Cukurova, Kasia Muldner, Rose Luckin, and Eva Mill\u00e1n (Eds.), Springer, 334\u2013339."},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3716640.3716656"},{"key":"e_1_3_2_76_2","first-page":"297","volume-title":"Proceedings of the 2024 on Innovation and Technology in Computer Science Education","volume":"1","author":"Vadaparty Annapurna","year":"2024","unstructured":"Annapurna Vadaparty, Daniel Zingaro, David H. Smith, IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 instruction. In Proceedings of the 2024 on Innovation and Technology in Computer Science Education, Vol. 1, 297\u2013303."},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649217.3653600"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40593-022-00322-1"},{"issue":"4","key":"e_1_3_2_79_2","doi-asserted-by":"crossref","first-page":"467","DOI":"10.3390\/bs15040467","article-title":"A systematic review of responses, attitudes, and utilization behaviors on generative AI for teaching and learning in higher education","volume":"15","author":"Wu Fan","year":"2025","unstructured":"Fan Wu, Yang Dang, and Manli Li. 2025. A systematic review of responses, attitudes, and utilization behaviors on generative AI for teaching and learning in higher education. Behavioral SciEence 15, 4, (Apr. 2025), 467.","journal-title":"Behavioral SciEence"},{"key":"e_1_3_2_80_2","unstructured":"Shitao Xiao Zheng Liu Peitian Zhang and Niklas Muennighoff. 2023. C-pack: Packaged resources to advance general Chinese embedding. arXiv:2309.07597. Retrieved from https:\/\/arxiv.org\/abs\/2309.07597"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1080\/08993408.2019.1565235"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","unstructured":"Chenyan Zhao Mariana Silva and Seth Poulsen. 2025. Language models are few-shot graders. In Artificial Intelligence in Education: 26th International Conference AIED 2025 Palermo Italy July 22\u201326 2025. Proceedings Part IV. Springer-Verlag Berlin 3\u201316. DOI: 10.1007\/978-3-031-98459-4_1","DOI":"10.1007\/978-3-031-98459-4_1"}],"container-title":["ACM Transactions on Interactive Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3774752","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T08:21:26Z","timestamp":1765354886000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3774752"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,10]]},"references-count":81,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3774752"],"URL":"https:\/\/doi.org\/10.1145\/3774752","relation":{},"ISSN":["2160-6455","2160-6463"],"issn-type":[{"value":"2160-6455","type":"print"},{"value":"2160-6463","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,10]]},"assertion":[{"value":"2024-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}