{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T19:31:10Z","timestamp":1772652670514,"version":"3.50.1"},"reference-count":61,"publisher":"Elsevier BV","issue":"5","license":[{"start":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T00:00:00Z","timestamp":1754438400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T00:00:00Z","timestamp":1754438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Large language models (LLMs) have the potential to revolutionize various fields, including code development, robotics, finance, and education, due to their extensive prior knowledge and rapid advancements. This paper investigates how LLMs can be leveraged in engineering education. Specifically, we benchmark the capabilities of different LLMs, including GPT-3.5 Turbo, GPT-4o, and Llama 3 70B, in assessing homework for an undergraduate-level circuit analysis course. We have developed a novel dataset consisting of official reference solutions and real student solutions to problems from various topics in circuit analysis. To overcome the limitations of image recognition in current state-of-the-art LLMs, the solutions in the dataset are converted to LaTeX format. Using this dataset, a prompt template is designed to test five metrics of student solutions: completeness, method, final answer, arithmetic error, and units. The results show that GPT-4o and Llama 3 70B perform significantly better than GPT-3.5 Turbo across all five metrics, with GPT-4o and Llama 3 70B each having distinct advantages in different evaluation aspects. Additionally, we present insights into the limitations of current LLMs in several aspects of circuit analysis. Given the paramount importance of ensuring reliability in LLM-generated homework assessment to avoid misleading students, our results establish benchmarks and offer valuable insights for the development of a reliable, personalized tutor for circuit analysis\u2014a focus of our future work. Furthermore, the proposed evaluation methods can be generalized to a broader range of courses for engineering education in the future.<\/jats:p>","DOI":"10.1007\/s40593-025-00501-w","type":"journal-article","created":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T20:23:18Z","timestamp":1754511798000},"page":"3294-3355","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Benchmarking Large Language Models on Homework Assessment in Circuit Analysis"],"prefix":"10.1016","volume":"35","author":[{"given":"Liangliang","family":"Chen","sequence":"first","affiliation":[]},{"given":"Zhihao","family":"Qin","sequence":"additional","affiliation":[]},{"given":"Yiming","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Jacqueline","family":"Rohde","sequence":"additional","affiliation":[]},{"given":"Ying","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2025,8,6]]},"reference":[{"key":"501_CR1","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., et al. (2023). Gpt-4 technical report (pp. 1\u2013100). arXiv:2303.08774. https:\/\/doi.org\/10.48550\/arXiv.2303.08774. 2303.08774"},{"issue":"3","key":"501_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.14569\/IJACSA.2019.0100328","volume":"10","author":"H Aldriye","year":"2019","unstructured":"Aldriye, H., Alkhalaf, A., & Alkhalaf, M. (2019). Automated grading systems for programming assignments: A literature review. International Journal of Advanced Computer Science and Applications, 10(3), 1\u20138. https:\/\/doi.org\/10.14569\/IJACSA.2019.0100328","journal-title":"International Journal of Advanced Computer Science and Applications"},{"key":"501_CR3","doi-asserted-by":"publisher","unstructured":"Ansari, A. N., Ahmad, S., & Bhutta, S. M. (2023). Mapping the global evidence around the use of chatgpt in higher education: A systematic scoping review. Education and Information Technologies, 1\u201341. https:\/\/doi.org\/10.1007\/s10639-023-12223-4","DOI":"10.1007\/s10639-023-12223-4"},{"issue":"2","key":"501_CR4","doi-asserted-by":"publisher","first-page":"343","DOI":"10.21093\/ijeltal.v7i2.1387","volume":"7","author":"R Baskara","year":"2023","unstructured":"Baskara, R. (2023). Exploring the implications of chatgpt for language learning in higher education. Indonesian Journal of English Language Teaching and Applied Linguistics, 7(2), 343\u2013358. https:\/\/doi.org\/10.21093\/ijeltal.v7i2.1387","journal-title":"Indonesian Journal of English Language Teaching and Applied Linguistics"},{"issue":"3","key":"501_CR5","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1111\/jcal.12793","volume":"39","author":"A Botelho","year":"2023","unstructured":"Botelho, A., Baral, S., Erickson, J. A., Benachamardi, P., & Heffernan, N. T. (2023). Leveraging natural language processing to support automated assessment and feedback for student open responses in mathematics. Journal of Computer Assisted Learning, 39(3), 823\u2013840. https:\/\/doi.org\/10.1111\/jcal.12793","journal-title":"Journal of Computer Assisted Learning"},{"key":"501_CR6","unstructured":"Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners. In Proceedings of the 34th conference on neural information processing systems (neurips 2020) (pp. 1877\u20131901). Vancouver, Canada."},{"key":"501_CR7","doi-asserted-by":"publisher","unstructured":"Cai, H., Cai, X., Chang, J., Li, S., Yao, L., Wang, C., et al. (2024). Sciassess: Benchmarking llm proficiency in scientific literature analysis (pp. 1\u201322). arXiv:2403.01976. https:\/\/doi.org\/10.48550\/arXiv.2403.01976","DOI":"10.48550\/arXiv.2403.01976"},{"issue":"7","key":"501_CR8","doi-asserted-by":"publisher","first-page":"6075","DOI":"10.1109\/LRA.2024.3400189","volume":"9","author":"L Chen","year":"2024","unstructured":"Chen, L., Lei, Y., Jin, S., Zhang, Y., & Zhang, L. (2024). Rlingua: Improving reinforcement learning sample efficiency in robotic manipulations with large language models. IEEE Robotics and Automation Letters, 9(7), 6075\u20136082. https:\/\/doi.org\/10.1109\/LRA.2024.3400189","journal-title":"IEEE Robotics and Automation Letters"},{"key":"501_CR9","doi-asserted-by":"crossref","unstructured":"Chiang, C.-H., & Lee, H.-y. (2024). Can large language models be an alternative to human evaluations? In: Proceedings of the 61st annual meeting of the association for computational linguistics (acl 2023) \u2013 volume 1: Long papers (pp. 15607\u201315631). Toronto, Canada.","DOI":"10.18653\/v1\/2023.acl-long.870"},{"key":"501_CR10","doi-asserted-by":"publisher","unstructured":"Cribben, I., & Zeinali, Y. (2023). The benefits and limitations of chatgpt in business education and research: A focus on management science, operations management and data analytics. Operations Management and Data Analytics (March 29, 2023), 1\u201348. https:\/\/doi.org\/10.2139\/ssrn.4404276","DOI":"10.2139\/ssrn.4404276"},{"key":"501_CR11","unstructured":"Du, M., Luu, A. T., Ji, B., & Ng, S.-K. (2024). Mercury: An efficiency benchmark for llm code synthesis. In 38th conference on neural information processing systems (neurips 2024) track on datasets and benchmarks (pp. 1\u201322). Vancouver, Canada."},{"issue":"4","key":"501_CR12","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/MS.2023.3265877","volume":"40","author":"C Ebert","year":"2023","unstructured":"Ebert, C., & Louridas, P. (2023). Generative ai for software practitioners. IEEE Software, 40(4), 30\u201338. https:\/\/doi.org\/10.1109\/MS.2023.3265877","journal-title":"IEEE Software"},{"issue":"1","key":"501_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.51219\/JAIMLD\/oluwole-fagbohun\/19","volume":"2","author":"O Fagbohun","year":"2024","unstructured":"Fagbohun, O., Iduwe, N., Abdullahi, M., Ifaturoti, A., & Nwanna, O. (2024). Beyond traditional assessment: Exploring the impact of large language models on grading practices. Journal of Artifical Intelligence and Machine Learning & Data Science, 2(1), 1\u20138. https:\/\/doi.org\/10.51219\/JAIMLD\/oluwole-fagbohun\/19","journal-title":"Journal of Artifical Intelligence and Machine Learning & Data Science"},{"key":"501_CR14","unstructured":"Frieder, S., Pinchetti, L., Griffiths, R.-R., Salvatori, T., Lukasiewicz, T., Petersen, P., & Berner, J. (2023). Mathematical capabilities of chatgpt. In Proceedings of the 37th conference on neural information processing systems (neurips 2023) track on datasets and benchmarks (pp. 1\u201346). New Orleans, USA."},{"key":"501_CR15","doi-asserted-by":"crossref","unstructured":"Guha, N., Nyarko, J., Ho, D., R\u00e9, C., Chilton, A., Chohlas-Wood, A., et al. (2023). Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. In Proceedings of the 37th international conference on neural information processing system (neurips 2023) (pp. 44123\u201344279). New Orleans, USA.","DOI":"10.2139\/ssrn.4583531"},{"key":"501_CR16","unstructured":"Guo, T., Nan, B., Liang, Z., Guo, Z., Chawla, N., Wiest, O., et al. (2023). What can large language models do in chemistry? a comprehensive benchmark on eight tasks. In Proceedings of the 37th conference on neural information processing systems (neurips 2023) track on datasets and benchmarks (pp. 59662\u201359688). New Orleans, USA."},{"issue":"7","key":"501_CR17","doi-asserted-by":"publisher","first-page":"2163","DOI":"10.1109\/TCAD.2022.3217421","volume":"42","author":"K Hakhamaneshi","year":"2022","unstructured":"Hakhamaneshi, K., Nassar, M., Phielipp, M., Abbeel, P., & Stojanovic, V. (2022). Pretraining graph neural networks for few-shot analog circuit modeling and design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(7), 2163\u20132173. https:\/\/doi.org\/10.1109\/TCAD.2022.3217421","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"501_CR18","doi-asserted-by":"crossref","unstructured":"Hellas, A., Leinonen, J., Sarsa, S., Koutcheme, C., Kujanp\u00e4\u00e4, L., & Sorva, J. (2023). Exploring the responses of large language models to beginner programmers\u2019 help requests. In Proceedings of the 2023 acm conference on international computing education research (icer 2023) - volume 1 (pp. 93\u2013105). Chicago, USA.","DOI":"10.1145\/3568813.3600139"},{"issue":"2","key":"501_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3703155","volume":"43","author":"L Huang","year":"2025","unstructured":"Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., et al. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1\u201355. https:\/\/doi.org\/10.1145\/3703155","journal-title":"ACM Transactions on Information Systems"},{"issue":"12","key":"501_CR20","doi-asserted-by":"publisher","first-page":"15873","DOI":"10.1007\/s10639-023-11834-1","volume":"28","author":"J Jeon","year":"2023","unstructured":"Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the complementary relationship between human teachers and chatgpt. Education and Information Technologies, 28(12), 15873\u201315892. https:\/\/doi.org\/10.1007\/s10639-023-11834-1","journal-title":"Education and Information Technologies"},{"issue":"12","key":"501_CR21","doi-asserted-by":"publisher","first-page":"8622","DOI":"10.1109\/TKDE.2024.3469578","volume":"36","author":"B Jin","year":"2024","unstructured":"Jin, B., Liu, G., Han, C., Jiang, M., Ji, H., & Han, J. (2024). Large language models on graphs: A comprehensive survey. IEEE Transactions on Knowledge and Data Engineering, 36(12), 8622\u20138642. https:\/\/doi.org\/10.1109\/TKDE.2024.3469578","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"501_CR22","doi-asserted-by":"publisher","first-page":"102274","DOI":"10.1016\/j.lindif.2023.102274","volume":"103","author":"E Kasneci","year":"2023","unstructured":"Kasneci, E., Se\u00dfler, K., K\u00fcchemann, S., Bannert, M., Dementieva, D., Fischer, F., et al. (2023). Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https:\/\/doi.org\/10.1016\/j.lindif.2023.102274","journal-title":"Learning and Individual Differences"},{"key":"501_CR23","doi-asserted-by":"publisher","unstructured":"Kevian, D., Syed, U., Guo, X., Havens, A., Dullerud, G., Seiler, P., & Hu, B. (2024). Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra (pp. 1\u201326). arXiv:2404.03647. https:\/\/doi.org\/10.48550\/arXiv.2404.03647","DOI":"10.48550\/arXiv.2404.03647"},{"key":"501_CR24","doi-asserted-by":"crossref","unstructured":"Lan, A. S., Vats, D., Waters, A. E., & Baraniuk, R. G. (2015). Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Proceedings of the 2nd acm conference on learning @ scale (l@s 2015) (pp. 167\u2013176). Vancouver Canada.","DOI":"10.1145\/2724660.2724664"},{"key":"501_CR25","doi-asserted-by":"publisher","unstructured":"Lee, G.-G., Latif, E., Wu, X., Liu, N., & Zhai, X. (2024). Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence, 100213. https:\/\/doi.org\/10.1016\/j.caeai.2024.100213","DOI":"10.1016\/j.caeai.2024.100213"},{"key":"501_CR26","doi-asserted-by":"crossref","unstructured":"Li, H., Li, C., Xing, W., Baral, S., & Heffernan, N. (2024). Automated feedback for student math responses based on multi-modality and fine-tuning. In Proceedings of the 14th learning analytics and knowledge conference (lak 2024) (pp. 763\u2013770). Kyoto, Japan.","DOI":"10.1145\/3636555.3636860"},{"key":"501_CR27","unstructured":"Liu, J., Zhou, P., Hua, Y., Chong, D., Tian, Z., Liu, A., et al. (2023). Benchmarking large language models on cmexam-a comprehensive chinese medical exam dataset. In 37th conference on neural information processing systems (neurips 2023) track on datasets and benchmarks (pp. 52430\u201352452). New Orleans, USA."},{"issue":"9","key":"501_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3560815","volume":"55","author":"P Liu","year":"2023","unstructured":"Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1\u201335. https:\/\/doi.org\/10.1145\/3560815","journal-title":"ACM Computing Surveys"},{"key":"501_CR29","doi-asserted-by":"crossref","unstructured":"Liu, T., Ding, W., Wang, Z., Tang, J., Huang, G. Y., & Liu, Z. (2019). Automatic short answer grading via multiway attention networks. In Processingd of the 20th international conference on artificial intelligence in education (aied 2019) (pp. 169\u2013173). Chicago, USA.","DOI":"10.1007\/978-3-030-23207-8_32"},{"key":"501_CR30","unstructured":"Ma, Y. J., Liang, W., Wang, G., Huang, D.-A., Bastani, O., Jayaraman, D., & Anandkumar, A. (2024). Eureka: Human-level reward design via coding large language models. In Proceedings of the 12th international conference on learning representations (iclr 2024) (pp. 1\u201345). Vienna, Austria."},{"key":"501_CR31","unstructured":"Mei\u00dfner, N., Speth, S., Kieslinger, J., & Becker, S. (2024). Evalquiz\u2013llm-based automated generation of self-assessment quizzes in software engineering education. In Software engineering im unterricht der hochschulen 2024 (pp. 53\u201364). Bonn, Germany."},{"key":"501_CR32","doi-asserted-by":"crossref","unstructured":"Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 conference on empirical methods in natural language processing (emnlp 2022) (pp. 1\u201319). Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.759"},{"key":"501_CR33","doi-asserted-by":"publisher","unstructured":"Mustapha, K. B., Yap, E. H., & Abakr, Y. A. (2024). Bard, chatgpt and 3dgpt: A scientometric analysis of generative ai tools and assessment of implications for mechanical engineering education. Interactive Technology and Smart Education, 588\u2013624. https:\/\/doi.org\/10.1108\/ITSE-10-2023-0198","DOI":"10.1108\/ITSE-10-2023-0198"},{"key":"501_CR34","doi-asserted-by":"crossref","unstructured":"Nam, D., Macvean, A., Hellendoorn, V., Vasilescu, B., & Myers, B. (2024). Using an llm to help with code understanding. In Proceedings of the ieee\/acm 46th international conference on software engineering (icse 2024) (pp. 1\u201313). Lisbon, Portugal.","DOI":"10.1145\/3597503.3639187"},{"key":"501_CR35","doi-asserted-by":"publisher","unstructured":"Ngoc, T. N., Tran, Q. N., Tang, A., Nguyen, B., Nguyen, T., & Pham, T. (2023). Ai-assisted learning for electronic engineering courses in high education (pp. 1\u201313). arXiv:2311.01048. https:\/\/doi.org\/10.48550\/arXiv.2311.01048","DOI":"10.48550\/arXiv.2311.01048"},{"key":"501_CR36","unstructured":"Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., et al. (2022). Training language models to follow instructions with human feedback. In Proceedings of the 36th conference on neural information processing systems (neurips 2022) (pp. 27730\u201327744). New Orleans, USA."},{"issue":"3","key":"501_CR37","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/MS.2023.3248401","volume":"40","author":"I Ozkaya","year":"2023","unstructured":"Ozkaya, I. (2023). Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Software, 40(3), 4\u20138. https:\/\/doi.org\/10.1109\/MS.2023.3248401","journal-title":"IEEE Software"},{"key":"501_CR38","doi-asserted-by":"crossref","unstructured":"Qadir, J. (2023). Engineering education in the era of chatgpt: Promise and pitfalls of generative ai for education. In Proceedings of the 2023 ieee global engineering education conference (educon) (pp. 1\u20139). Kuwait, Kuwait.","DOI":"10.1109\/EDUCON54358.2023.10125121"},{"key":"501_CR39","doi-asserted-by":"publisher","unstructured":"Reid, M., Savinov, N., Teplyashin, D., Lepikhin, D., Lillicrap, T., Alayrac, J.-b., et al. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context (pp. 1\u2013154). arXiv:2403.05530. https:\/\/doi.org\/10.48550\/arXiv.2403.05530","DOI":"10.48550\/arXiv.2403.05530"},{"key":"501_CR40","doi-asserted-by":"crossref","unstructured":"Rohde, J., Karyekar, S. P., Chen, L., Guo, Y., & Zhang, Y. (2024). Predictors of student academic success in an upper-level microelectronic circuits course. In Proceedings of the 2024 asee annual conference & exposition (asee 2024) (pp. 1\u201319). Portland, USA.","DOI":"10.18260\/1-2--47860"},{"key":"501_CR41","doi-asserted-by":"publisher","first-page":"e50945","DOI":"10.2196\/50945","volume":"9","author":"CW Safranek","year":"2023","unstructured":"Safranek, C. W., Sidamon-Eristoff, A. E., Gilson, A., & Chartash, D. (2023). The role of large language models in medical education: Applications and implications. JMIR Medical Education, 9, e50945. https:\/\/doi.org\/10.2196\/50945","journal-title":"JMIR Medical Education"},{"key":"501_CR42","doi-asserted-by":"crossref","unstructured":"Sui, Y., Zhou, M., Zhou, M., Han, S., & Zhang, D. (2024). Table meets llm: Can large language models understand structured table data? a benchmark and empirical study. In Proceedings of the 17th acm international conference on web search and data mining (wsdm 2024) (pp. 645\u2013654). Merida, Mexico.","DOI":"10.1145\/3616855.3635752"},{"key":"501_CR43","volume-title":"Introduction to electric circuits (9th edition)","author":"JA Svoboda","year":"2013","unstructured":"Svoboda, J. A., & Dorf, R. C. (2013). Introduction to electric circuits (9th edition). John Wiley & Sons."},{"key":"501_CR44","doi-asserted-by":"publisher","unstructured":"Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., et al. (2023). Gemini: A family of highly capable multimodal models (pp. 1\u201390). arXiv:2312.11805. https:\/\/doi.org\/10.48550\/arXiv.2312.11805","DOI":"10.48550\/arXiv.2312.11805"},{"issue":"8","key":"501_CR45","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","volume":"29","author":"AJ Thirunavukarasu","year":"2023","unstructured":"Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930\u20131940. https:\/\/doi.org\/10.1038\/s41591-023-02448-8","journal-title":"Nature Medicine"},{"key":"501_CR46","doi-asserted-by":"publisher","unstructured":"Tian, J., Hou, J., Wu, Z., Shu, P., Liu, Z., Xiang, Y., et al. (2024). Assessing large language models in mechanical engineering education: A study on mechanics-focused conceptual understanding (pp. 1\u201331). arXiv:2401.12983. https:\/\/doi.org\/10.48550\/arXiv.2401.12983","DOI":"10.48550\/arXiv.2401.12983"},{"key":"501_CR47","doi-asserted-by":"publisher","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., et al. (2023a). Llama: Open and efficient foundation language models (pp. 1\u201327). arXiv:2302.13971. https:\/\/doi.org\/10.48550\/arXiv.2302.13971","DOI":"10.48550\/arXiv.2302.13971"},{"key":"501_CR48","doi-asserted-by":"publisher","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., et al. (2023b). Llama 2: Open foundation and fine-tuned chat models (pp. 1\u201377). arXiv:2307.09288. https:\/\/doi.org\/10.48550\/arXiv.2307.09288","DOI":"10.48550\/arXiv.2307.09288"},{"key":"501_CR49","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.ece.2023.05.001","volume":"44","author":"M-L Tsai","year":"2023","unstructured":"Tsai, M.-L., Ong, C. W., & Chen, C.-L. (2023). Exploring the use of large language models (llms) in chemical engineering education: Building core course problem models with chat-gpt. Education for Chemical Engineers, 44, 71\u201395. https:\/\/doi.org\/10.1016\/j.ece.2023.05.001","journal-title":"Education for Chemical Engineers"},{"key":"501_CR50","unstructured":"Valmeekam, K., Marquez, M., Olmo, A., Sreedharan, S., & Kambhampati, S. (2023). Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. In 37th conference on neural information processing systems (neurips 2023) track on datasets and benchmarks (pp. 38975\u201338987). New Orleans, USA."},{"key":"501_CR51","doi-asserted-by":"publisher","unstructured":"Wang, T., Zhou, N., & Chen, Z. (2024). Enhancing computer programming education with llms: A study on effective prompt engineering for python code generation (pp. 1\u201318). arXiv:2407.05437. https:\/\/doi.org\/10.48550\/arXiv.2407.05437","DOI":"10.48550\/arXiv.2407.05437"},{"key":"501_CR52","doi-asserted-by":"publisher","unstructured":"Xiao, C., Ma, W., Xu, S. X., Zhang, K., Wang, Y., & Fu, Q. (2024). From automation to augmentation: Large language models elevating essay scoring landscape (pp. 1\u201314). arXiv:2401.06431. https:\/\/doi.org\/10.48550\/arXiv.2401.06431","DOI":"10.48550\/arXiv.2401.06431"},{"key":"501_CR53","doi-asserted-by":"publisher","unstructured":"Xie, Q., Han, W., Chen, Z., Xiang, R., Zhang, X., He, Y., et al. (2024). The finben: An holistic financial benchmark for large language models (pp. 1\u201326). arXiv:2402.12659. https:\/\/doi.org\/10.48550\/arXiv.2402.12659","DOI":"10.48550\/arXiv.2402.12659"},{"key":"501_CR54","unstructured":"Xie, T., Zhao, S., Wu, C. H., Liu, Y., Luo, Q., Zhong, V., & Yu, T. (2024). Text2reward: Automated dense reward function generation for reinforcement learning. In Proceedings of the 12th international conference on learning representations (iclr 2024) (pp. 1\u201337). Vienna, Austria."},{"key":"501_CR55","doi-asserted-by":"publisher","unstructured":"Xie, W., Niu, J., Xue, C. J., & Guan, N. (2024). Grade like a human: Rethinking automated assessment with large language models (pp. 1\u201316). arXiv:2405.19694. https:\/\/doi.org\/10.48550\/arXiv.2405.19694","DOI":"10.48550\/arXiv.2405.19694"},{"key":"501_CR56","doi-asserted-by":"publisher","unstructured":"Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models (pp. 1\u201325). arXiv:2401.11817. https:\/\/doi.org\/10.48550\/arXiv.2401.11817","DOI":"10.48550\/arXiv.2401.11817"},{"key":"501_CR57","doi-asserted-by":"publisher","first-page":"51818","DOI":"10.1109\/ACCESS.2024.3385862","volume":"12","author":"Y Yamakaji","year":"2024","unstructured":"Yamakaji, Y., Shouno, H., & Fukushima, K. (2024). Circuit2graph: Circuits with graph neural networks. IEEE Access, 12, 51818\u201351827. https:\/\/doi.org\/10.1109\/ACCESS.2024.3385862","journal-title":"IEEE Access"},{"key":"501_CR58","doi-asserted-by":"crossref","unstructured":"Yancey, K. P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short l2 essays on the cefr scale with gpt-4. In Proceedings of the 18th workshop on innovative use of nlp for building educational applications (bea 2023) (pp. 576\u2013584). Toronto, Canada.","DOI":"10.18653\/v1\/2023.bea-1.49"},{"key":"501_CR59","doi-asserted-by":"publisher","unstructured":"Yoo, H., Han, J., Ahn, S.-Y., & Oh, A. (2024). Dress: Dataset for rubric-based essay scoring on efl writing (pp. 1\u201313). arXiv:2402.16733. https:\/\/doi.org\/10.48550\/arXiv.2402.16733","DOI":"10.48550\/arXiv.2402.16733"},{"key":"501_CR60","doi-asserted-by":"publisher","unstructured":"Zhu, Y., Zhu, C., Wu, T., Wang, S., Zhou, Y., Chen, J., & Li, Y. (2025). Impact of assignment completion assisted by large language model-based chatbot on middle school students\u2019 learning. Education and Information Technologies, 2429\u20132461. https:\/\/doi.org\/10.1007\/s10639-024-12898-3","DOI":"10.1007\/s10639-024-12898-3"},{"issue":"1","key":"501_CR61","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1162\/coli_a_00502","volume":"50","author":"C Ziems","year":"2024","unstructured":"Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., & Yang, D. (2024). Can large language models transform computational social science? Computational Linguistics, 50(1), 237\u2013291. https:\/\/doi.org\/10.1162\/coli_a_00502","journal-title":"Computational Linguistics"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00501-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-025-00501-w","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00501-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:45Z","timestamp":1772647965000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-025-00501-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,6]]},"references-count":61,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["501"],"URL":"https:\/\/doi.org\/10.1007\/s40593-025-00501-w","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,6]]},"assertion":[{"value":"2 July 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 August 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}