{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T19:01:31Z","timestamp":1754161291959,"version":"3.41.2"},"publisher-location":"New York, NY, USA","reference-count":45,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,23]]},"DOI":"10.1145\/3696630.3727236","type":"proceedings-article","created":{"date-parts":[[2025,7,28]],"date-time":"2025-07-28T19:09:27Z","timestamp":1753729767000},"page":"789-799","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Applying Large Language Models to Enhance the Assessment of Java Programming Assignments"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-8051-1063","authenticated-orcid":false,"given":"Skyler","family":"Grandel","sequence":"first","affiliation":[{"name":"School of Engineering, Vanderbilt University, Nashville, Tennessee, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7389-4995","authenticated-orcid":false,"given":"Douglas","family":"Schmidt","sequence":"additional","affiliation":[{"name":"School of Engineering, Vanderbilt University, Nashville, Tennessee, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4001-3442","authenticated-orcid":false,"given":"Kevin","family":"Leach","sequence":"additional","affiliation":[{"name":"School of Engineering, Vanderbilt University, Nashville, Tennessee, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,7,28]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Anthropic. 2024. Claude Opus. https:\/\/claude.ai\/"},{"key":"e_1_3_2_1_2_1","volume-title":"Ask me anything: A simple strategy for prompting language models. arXiv preprint arXiv:2210.02441","author":"Arora Simran","year":"2022","unstructured":"Simran Arora, Avanika Narayan, Mayee F Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, and Christopher R\u00e9. 2022. Ask me anything: A simple strategy for prompting language models. arXiv preprint arXiv:2210.02441 (2022)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","first-page":"52","DOI":"10.61969\/jai.1337500","article-title":"Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning","volume":"7","author":"Baidoo-Anu David","year":"2023","unstructured":"David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI 7, 1 (2023), 52\u201362.","journal-title":"Journal of AI"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","first-page":"9","DOI":"10.58496\/MJCSC\/2023\/002","article-title":"Role of ChatGPT in Computer Programming","volume":"2023","author":"Biswas Som","year":"2023","unstructured":"Som Biswas. 2023. Role of ChatGPT in Computer Programming. Mesopotamian Journal of Computer Science 2023 (2023), 9\u201315.","journal-title":"Mesopotamian Journal of Computer Science"},{"key":"e_1_3_2_1_5_1","volume-title":"It's good to talk? Developing feedback practice. Gateway Papers 1","author":"Blair Alasdair","year":"2010","unstructured":"Alasdair Blair and Samantha McGinty. 2010. It's good to talk? Developing feedback practice. Gateway Papers 1 (2010)."},{"key":"e_1_3_2_1_6_1","volume-title":"A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494","author":"Borji Ali","year":"2023","unstructured":"Ali Borji. 2023. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494 (2023)."},{"key":"e_1_3_2_1_7_1","volume-title":"INTED2013 Proceedings","author":"Caiza Julio C","year":"2013","unstructured":"Julio C Caiza and Jose M Del Alamo. 2013. Programming assignments automatic grading: review of tools and implementations. INTED2013 Proceedings (2013), 5691\u20135700."},{"key":"e_1_3_2_1_8_1","unstructured":"Anita Carleton Mark Klein John Robert Erin Harper Robert Cunningham Dionisio de Niz John Foreman John Goodenough James Herbsleb Ipek Ozkaya Douglas Schmidt and Forrest Shull. 2021. Architecting the Future of Software Engineering: A National Agenda for Software Engineering Research & Development. Accessed: 2023-Dec-7."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/S0360-1315(03)00030-7","article-title":"On automated grading of programming assignments in an academic institution","volume":"41","author":"Cheang Brenda","year":"2003","unstructured":"Brenda Cheang, Andy Kurnia, Andrew Lim, and Wee-Chong Oon. 2003. On automated grading of programming assignments in an academic institution. Computers & Education 41, 2 (2003), 121\u2013131.","journal-title":"Computers & Education"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 563\u2013569","author":"Chen Binglin","year":"2020","unstructured":"Binglin Chen, Sushmita Azad, Rajarshi Haldar, Matthew West, and Craig Zilles. 2020. A validated scoring rubric for explain-in-plain-english questions. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 563\u2013569."},{"key":"e_1_3_2_1_11_1","volume-title":"International Conference on Artificial Intelligence in Education. Springer, 321\u2013327","author":"Chen Eason","year":"2023","unstructured":"Eason Chen, Ray Huang, Han-Shin Chen, Yuen-Hsien Tseng, and Liang-Yi Li. 2023. GPTutor: a ChatGPT-powered programming tool for code explanation. In International Conference on Artificial Intelligence in Education. Springer, 321\u2013327."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the Third (2016)","author":"Geigle Chase","year":"2016","unstructured":"Chase Geigle, ChengXiang Zhai, and Duncan C Ferguson. 2016. An exploration of automated grading of complex assignments. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale. 351\u2013360."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 1st International Workshop on Large Language Models for Code. 102\u2013110","author":"Grandel Skyler","year":"2024","unstructured":"Skyler Grandel, Douglas C Schmidt, and Kevin Leach. 2024. Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming Assignments. In Proceedings of the 1st International Workshop on Large Language Models for Code. 102\u2013110."},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 41st ACM technical symposium on Computer science education. 199\u2013203","author":"Hertz Matthew","year":"2010","unstructured":"Matthew Hertz. 2010. What do\" CS1\" and\" CS2\" mean? Investigating differences in the early courses. In Proceedings of the 41st ACM technical symposium on Computer science education. 199\u2013203."},{"key":"e_1_3_2_1_15_1","volume-title":"Olga C Santos, Mercedes T Rodrigo, Mutlu Cukurova, Ig Ibert Bittencourt, et al.","author":"Holmes Wayne","year":"2021","unstructured":"Wayne Holmes, Kaska Porayska-Pomsta, Ken Holstein, Emma Sutherland, Toby Baker, Simon Buckingham Shum, Olga C Santos, Mercedes T Rodrigo, Mutlu Cukurova, Ig Ibert Bittencourt, et al. 2021. Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Education (2021), 1\u201323."},{"key":"e_1_3_2_1_16_1","unstructured":"Stephen C Johnson. 1977. Lint a C program checker. Bell Telephone Laboratories Murray Hill."},{"key":"e_1_3_2_1_17_1","volume-title":"The use of scoring rubrics: Reliability, validity and educational consequences. Educational research review 2, 2","author":"Jonsson Anders","year":"2007","unstructured":"Anders Jonsson and Gunilla Svingby. 2007. The use of scoring rubrics: Reliability, validity and educational consequences. Educational research review 2, 2 (2007), 130\u2013144."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 1307\u20131316","author":"Kharkar Anant","year":"2022","unstructured":"Anant Kharkar, Roshanak Zilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin Clement, and Neel Sundaresan. 2022. Learning to reduce false positives in analytic bug detectors. In Proceedings of the 44th International Conference on Software Engineering. 1307\u20131316."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1177\/00336882231162868","article-title":"ChatGPT for language teaching and learning","volume":"54","author":"Kohnke Lucas","year":"2023","unstructured":"Lucas Kohnke, Benjamin Luke Moorhouse, and Di Zou. 2023. ChatGPT for language teaching and learning. Relc Journal 54, 2 (2023), 537\u2013550.","journal-title":"Relc Journal"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.jcm.2016.02.012","article-title":"A guideline of selecting and reporting intraclass correlation coefficients for reliability research","volume":"15","author":"Koo Terry K","year":"2016","unstructured":"Terry K Koo and Mae Y Li. 2016. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15, 2 (2016), 155\u2013163.","journal-title":"Journal of chiropractic medicine"},{"key":"e_1_3_2_1_21_1","volume-title":"Camille Elepa\u00f1o, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, et al.","author":"Kung Tiffany H","year":"2023","unstructured":"Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepa\u00f1o, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, et al. 2023. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health 2, 2 (2023), e0000198."},{"key":"e_1_3_2_1_22_1","volume-title":"Forming inferences about some intraclass correlation coefficients. Psychological methods 1, 1","author":"McGraw Kenneth O","year":"1996","unstructured":"Kenneth O McGraw and Seok P Wong. 1996. Forming inferences about some intraclass correlation coefficients. Psychological methods 1, 1 (1996), 30."},{"key":"e_1_3_2_1_23_1","unstructured":"Gr\u00e9goire Mialon Roberto Dess\u00ec Maria Lomeli Christoforos Nalmpantis Ram Pasunuru Roberta Raileanu Baptiste Rozi\u00e8re Timo Schick Jane Dwivedi-Yu Asli Celikyilmaz et al. 2023. Augmented language models: a survey. arXiv preprint arXiv:2302.07842 (2023)."},{"key":"e_1_3_2_1_24_1","unstructured":"Mistral AI. 2024. Mistral Large 2. https:\/\/mistral.ai\/"},{"volume-title":"Cocomo ii forum","author":"Nguyen Vu","key":"e_1_3_2_1_25_1","unstructured":"Vu Nguyen, Sophia Deeds-Rubin, Thomas Tan, and Barry Boehm. 2007. A SLOC counting standard. In Cocomo ii forum, Vol. 2007. Citeseer, 1\u201316."},{"key":"e_1_3_2_1_26_1","unstructured":"OpenAI. 2022. ChatGPT. https:\/\/chat.openai.com\/"},{"key":"e_1_3_2_1_27_1","volume-title":"Carlos Anibal Suarez, and Michael Liut","author":"Orenstrakh Michael Sheinman","year":"2023","unstructured":"Michael Sheinman Orenstrakh, Oscar Karnalim, Carlos Anibal Suarez, and Michael Liut. 2023. Detecting llm-generated text in computing education: A comparative study for chatgpt cases. arXiv preprint arXiv:2307.07411 (2023)."},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the 2011 international symposium on software testing and analysis. 199\u2013209","author":"Parnin Chris","year":"2011","unstructured":"Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the 2011 international symposium on software testing and analysis. 199\u2013209."},{"key":"e_1_3_2_1_29_1","volume-title":"2019 ASEE Annual Conference & Exposition.","author":"Perretta James","year":"2019","unstructured":"James Perretta, Westley Weimer, and Andrew DeOrio. 2019. Human vs. automated coding style grading in computing education. In 2019 ASEE Annual Conference & Exposition."},{"key":"e_1_3_2_1_30_1","volume-title":"2023 IEEE Global Engineering Education Conference (EDUCON). IEEE, 1\u20139.","author":"Qadir Junaid","year":"2023","unstructured":"Junaid Qadir. 2023. Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In 2023 IEEE Global Engineering Education Conference (EDUCON). IEEE, 1\u20139."},{"key":"e_1_3_2_1_31_1","volume-title":"Formative assessment and the design of instructional systems. Instructional science 18, 2","author":"Sadler D Royce","year":"1989","unstructured":"D Royce Sadler. 1989. Formative assessment and the design of instructional systems. Instructional science 18, 2 (1989), 119\u2013144."},{"key":"e_1_3_2_1_32_1","volume-title":"Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27\u201343","author":"Sarsa Sami","year":"2022","unstructured":"Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27\u201343."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1007\/s40593-022-00289-z","article-title":"Towards trustworthy autograding of short, multi-lingual, multi-type answers","volume":"33","author":"Schneider Johannes","year":"2023","unstructured":"Johannes Schneider, Robin Richner, and Micha Riser. 2023. Towards trustworthy autograding of short, multi-lingual, multi-type answers. International Journal of Artificial Intelligence in Education 33, 1 (2023), 88\u2013118.","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"e_1_3_2_1_34_1","volume-title":"Towards LLM-based Autograding for Short Textual Answers. arXiv preprint arXiv:2309.11508","author":"Schneider Johannes","year":"2023","unstructured":"Johannes Schneider, Bernd Schenk, Christina Niklaus, and Michaelis Vlachos. 2023. Towards LLM-based Autograding for Short Textual Answers. arXiv preprint arXiv:2309.11508 (2023)."},{"key":"e_1_3_2_1_35_1","first-page":"17","article-title":"Use chat gpt to solve programming bugs","volume":"31","author":"Shafiq Surameery Nigar M","year":"2023","unstructured":"Nigar M Shafiq Surameery and Mohammed Y Shakor. 2023. Use chat gpt to solve programming bugs. International Journal of Information Technology and Computer Engineering 31 (2023), 17\u201322.","journal-title":"International Journal of Information Technology and Computer Engineering"},{"key":"e_1_3_2_1_36_1","volume-title":"Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects. arXiv preprint arXiv:2305.03433","author":"Tan Kehui","year":"2023","unstructured":"Kehui Tan, Tianqi Pang, and Chenyou Fan. 2023. Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects. arXiv preprint arXiv:2305.03433 (2023)."},{"key":"e_1_3_2_1_37_1","volume-title":"Aras Bozkurt, Daniel T Hickey, Ronghuai Huang, and Brighter Agyemang.","author":"Tlili Ahmed","year":"2023","unstructured":"Ahmed Tlili, Boulus Shehata, Michael Agyemang Adarkwah, Aras Bozkurt, Daniel T Hickey, Ronghuai Huang, and Brighter Agyemang. 2023. What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart learning environments 10, 1 (2023), 15."},{"key":"e_1_3_2_1_38_1","volume-title":"ChatGPT: five priorities for research. Nature 614, 7947","author":"Van Dis Eva AM","year":"2023","unstructured":"Eva AM Van Dis, Johan Bollen, Willem Zuidema, Robert van Rooij, and Claudi L Bockting. 2023. ChatGPT: five priorities for research. Nature 614, 7947 (2023), 224\u2013226."},{"key":"e_1_3_2_1_39_1","volume-title":"Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3","author":"Wang Yaqing","year":"2020","unstructured":"Yaqing Wang, Quanming Yao, James T Kwok, and Lionel MNi. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1\u201334."},{"key":"e_1_3_2_1_40_1","volume-title":"Chi, Quoc Le, and Denny Zhou","author":"Wei Jason","year":"2023","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]"},{"key":"e_1_3_2_1_41_1","volume-title":"Proceedings of the 30th Conference on Pattern Languages of Programs","author":"White Jules","year":"2023","unstructured":"Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. Proceedings of the 30th Conference on Pattern Languages of Programs (2023)."},{"volume-title":"Generative AI for Effective Software Development","author":"White Jules","key":"e_1_3_2_1_42_1","unstructured":"Jules White, Sam Hays, Quchen Fu, Jesse Spencer-Smith, and Douglas C Schmidt. 2024. Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. In Generative AI for Effective Software Development. Springer, 71\u2013108."},{"key":"e_1_3_2_1_43_1","volume-title":"Proceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training. 331\u2013341","author":"Xue Yuankai","year":"2024","unstructured":"Yuankai Xue, Hanlin Chen, Gina R Bai, Robert Tairas, and Yu Huang. 2024. Does ChatGPT Help With Introductory Programming? An Experiment of Students Using ChatGPT in CS1. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training. 331\u2013341."},{"key":"e_1_3_2_1_44_1","unstructured":"Shunyu Yao Jeffrey Zhao Dian Yu Nan Du Izhak Shafran Karthik Narasimhan and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL]"},{"key":"e_1_3_2_1_45_1","volume-title":"Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba.","author":"Zhou Yongchao","year":"2022","unstructured":"Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2022. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022)."}],"event":{"name":"FSE Companion '25: 33rd ACM International Conference on the Foundations of Software Engineering","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering"],"location":"Clarion Hotel Trondheim Trondheim Norway","acronym":"FSE Companion '25"},"container-title":["Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3696630.3727236","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,28]],"date-time":"2025-07-28T19:13:04Z","timestamp":1753729984000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3696630.3727236"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,23]]},"references-count":45,"alternative-id":["10.1145\/3696630.3727236","10.1145\/3696630"],"URL":"https:\/\/doi.org\/10.1145\/3696630.3727236","relation":{},"subject":[],"published":{"date-parts":[[2025,6,23]]},"assertion":[{"value":"2025-07-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}