{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T19:31:08Z","timestamp":1772652668930,"version":"3.50.1"},"reference-count":57,"publisher":"Elsevier BV","issue":"5","license":[{"start":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T00:00:00Z","timestamp":1752451200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T00:00:00Z","timestamp":1752451200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In mathematical proof education, there remains a need for interventions that help students learn to write mathematical proofs. Research has shown that timely feedback can be very helpful to students learning new skills. While for many years natural language processing models have struggled to perform well on tasks related to mathematical texts, recent developments in natural language processing have created the opportunity to complete the task of giving students instant feedback on their mathematical proofs. In this paper, we present a set of training methods and models capable of autograding freeform mathematical proofs by leveraging existing large language models and other machine learning techniques. The models are trained using proof data collected from four different proof by induction problems. We use four different robust large language models to compare their performances, and all achieve satisfactory performances to various degrees. Additionally, we recruit human graders to grade the same proofs as the training data, and find that the best grading model is also more accurate than most human graders in terms of grading accuracy. With the development of these grading models, we create and deploy an autograder for proof by induction problems and perform a user study with students. Results from the study shows that students are able to make significant improvements to their proofs using the feedback from the autograder, but students still do not trust the AI autograders as much as they trust human graders. Future work can improve on the autograder feedback and figure out ways to help students trust AI autograders.<\/jats:p>","DOI":"10.1007\/s40593-025-00498-2","type":"journal-article","created":{"date-parts":[[2025,7,13]],"date-time":"2025-07-13T23:24:16Z","timestamp":1752449056000},"page":"3202-3232","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Autograding Mathematical Induction Proofs with Natural Language Processing"],"prefix":"10.1016","volume":"35","author":[{"given":"Chenyan","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Mariana","family":"Silva","sequence":"additional","affiliation":[]},{"given":"Seth","family":"Poulsen","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2025,7,14]]},"reference":[{"issue":"2","key":"498_CR1","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1207\/s15327809jls0402_2","volume":"4","author":"JR Anderson","year":"1995","unstructured":"Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4(2), 167\u2013207.","journal-title":"The Journal of the Learning Sciences"},{"key":"498_CR2","doi-asserted-by":"crossref","unstructured":"Azad, S., Chen, B., Fowler, M., West, M., & Zilles, C. (2020). Strategies for deploying unreliable ai graders in high-transparency high-stakes exams. In Artificial intelligence in education: 21st international conference, AIED 2020, Ifrane, Morocco, July 6\u201310, 2020, Proceedings, Part I 21 (pp. 16\u201328). Springer.","DOI":"10.1007\/978-3-030-52237-7_2"},{"key":"498_CR3","doi-asserted-by":"publisher","unstructured":"Azerbayev, Z., Schoelkopf, H., Paster, K., Santos, M. D., McAleer, S., Jiang, A. Q., Deng, J., Biderman, S., & Welleck, S. (2023). Llemma: An open language model for mathematics. arXiv:2310.10631. https:\/\/doi.org\/10.48550\/arXiv.2310.10631. http:\/\/arxiv.org\/abs\/2310.10631. Accessed 29 Nov 2023.","DOI":"10.48550\/arXiv.2310.10631"},{"key":"498_CR4","unstructured":"Baker, J. D. (1996). Students\u2019 difficulties with proof by mathematical induction."},{"key":"498_CR5","doi-asserted-by":"publisher","unstructured":"Bonthu, S., Sripada, R. S., & Prasad, M. (2021). Automated short answer grading using deep learning: A survey, pp. 61\u201378. https:\/\/doi.org\/10.1007\/978-3-030-84060-0_5","DOI":"10.1007\/978-3-030-84060-0_5"},{"key":"498_CR6","doi-asserted-by":"publisher","unstructured":"Breitner, J. (2016). Visual theorem proving with the incredible proof machine, pp. 123\u2013139. https:\/\/doi.org\/10.1007\/978-3-319-43144-4_8","DOI":"10.1007\/978-3-319-43144-4_8"},{"key":"498_CR7","doi-asserted-by":"publisher","unstructured":"Brown, S. A. (2014). On skepticism and its role in the development of proof in the classroom. Educational Studies in Mathematics, 86(3), 311\u2013335.https:\/\/doi.org\/10.1007\/s10649-014-9544-4. Accessed 19 Jan 2022.","DOI":"10.1007\/s10649-014-9544-4"},{"key":"498_CR8","unstructured":"Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. arXiv:2005.14165. http:\/\/arxiv.org\/abs\/2005.14165. Accessed 10 Jan 2024."},{"key":"498_CR9","doi-asserted-by":"publisher","unstructured":"Castelo, N., Bos, M. W., & Lehmann, D. R. (2019). Task-dependent algorithm aversion. Journal of Marketing Research, 56(5), 809\u2013825. https:\/\/doi.org\/10.1177\/0022243719851788. Publisher: SAGE Publications Inc., Accessed 16 May 2024.","DOI":"10.1177\/0022243719851788"},{"key":"498_CR10","doi-asserted-by":"publisher","unstructured":"Chen, B., West, M., & Zilles, C. (2022). Peer-grading \u201cexplain in plain english\u201d: A bayesian calibration method for categorical answers. In Proceedings of the 53rd ACM technical symposium on computer science education - Volume 1. SIGCSE 2022 (pp. 133\u2013139). New York, NY, USA: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3478431.3499409","DOI":"10.1145\/3478431.3499409"},{"key":"498_CR11","volume-title":"Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science","author":"Computing Curricula, A. f. C. M. A., & Society, I. C","year":"2013","unstructured":"Computing Curricula, A. f. C. M. A., & Society, I. C. (2013). Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science. New York, NY, USA: Association for Computing Machinery."},{"key":"498_CR12","volume-title":"Curriculum guidelines for undergraduate degree programs in software engineering","author":"TJTF Computing Curricula","year":"2014","unstructured":"Computing Curricula, T. J. T. F. (2014). Curriculum guidelines for undergraduate degree programs in software engineering. New York, NY, USA: Technical report."},{"key":"498_CR13","volume-title":"Curriculum guidelines for undergraduate degree programs in computer engineering","author":"Computing Curricula, A. f. C. M. A., & Society, I. C","year":"2016","unstructured":"Computing Curricula, A. f. C. M. A., & Society, I. C. (2016). Curriculum guidelines for undergraduate degree programs in computer engineering. New York, NY, USA: Technical report."},{"key":"498_CR14","unstructured":"Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. http:\/\/arxiv.org\/abs\/1810.04805. Accessed 09 Sept 2023."},{"key":"498_CR15","unstructured":"Dubinsky, E. (1986). Teaching mathematical induction: I. The Journal of Mathematical Behavior, 5(3), 305\u2013317. Place: Netherlands Publisher: Elsevier Science."},{"key":"498_CR16","unstructured":"Dubinsky, E. (1989). Teaching mathematical induction II. Journal of Mathematical Behavior, 8(3), 285\u2013304. Accessed 23 Dec 2024."},{"key":"498_CR17","doi-asserted-by":"publisher","unstructured":"Fernandez, N., Ghosh, A., Liu, N., Wang, Z., Choffin, B., Baraniuk, R., & Lan, A. (2023). Automated scoring for reading comprehension via in-context BERT tuning. arXiv:2205.09864. https:\/\/doi.org\/10.48550\/arXiv.2205.09864. http:\/\/arxiv.org\/abs\/2205.09864. Accessed 17 Dec 2024.","DOI":"10.48550\/arXiv.2205.09864"},{"key":"498_CR18","doi-asserted-by":"publisher","unstructured":"Filighera, A., Parihar, S., Steuer, T., Meuser, T., & Ochs, S. (2022). Your answer is incorrect... would you like to know why? Introducing a bilingual short answer feedback dataset. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers) (pp. 8577\u20138591). Dublin, Ireland: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.587","DOI":"10.18653\/v1\/2022.acl-long.587"},{"key":"498_CR19","unstructured":"Fluckiger, H., Nelson, E., Kularajan, S., & Poulsen, S. (2025). Cognitive gaps in proof by induction: A replication study. In Proceedings of the 27th annual conference on research in undergradute mathematics education. Alexandria, Virginia: SIGMAA on RUME."},{"key":"498_CR20","doi-asserted-by":"publisher","unstructured":"Fowler, M., Chen, B., Azad, S., West, M., & Zilles, C. (2021). Autograding \u201cExplain in Plain English\u201d questions using NLP. In Proceedings of the 52nd ACM technical symposium on computer science education (pp. 1163\u20131169). Virtual Event USA: ACM. https:\/\/doi.org\/10.1145\/3408877.3432539. https:\/\/dl.acm.org\/doi\/10.1145\/3408877.3432539. Accessed 19 Dec 2023.","DOI":"10.1145\/3408877.3432539"},{"key":"498_CR21","doi-asserted-by":"publisher","unstructured":"Goldman, K., Gross, P., Heeren, C., Herman, G., Kaczmarczyk, L., Loui, M. C., & Zilles, C. (2008). Identifying important and difficult concepts in introductory computing courses using a delphi process. In Proceedings of the 39th SIGCSE technical symposium on computer science education. SIGCSE \u201908 (pp. 256\u2013260). New York, NY, USA: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/1352135.1352226. https:\/\/dl.acm.org\/doi\/10.1145\/1352135.1352226. Accessed 05 Dec 2023.","DOI":"10.1145\/1352135.1352226"},{"key":"498_CR22","doi-asserted-by":"publisher","unstructured":"Ha, T., & Kim, S. (2023). Improving trust in AI with mitigating confirmation bias: Effects of explanation type and debiasing strategy for decision-making with explainable AI. International Journal of Human-Computer Interaction, 0(0), 1\u201312. https:\/\/doi.org\/10.1080\/10447318.2023.2285640. Publisher: Taylor & Francis. Accessed 16 May 2024.","DOI":"10.1080\/10447318.2023.2285640"},{"key":"498_CR23","doi-asserted-by":"publisher","unstructured":"Haller, S., Aldea, A., Seifert, C., & Strisciuglio, N. (2022). Survey on automated short answer grading with deep learning: from word embeddings to transformers. arXiv:2204.03503. https:\/\/doi.org\/10.48550\/arXiv.2204.03503. http:\/\/arxiv.org\/abs\/2204.03503. Accessed 15 Nov 2023.","DOI":"10.48550\/arXiv.2204.03503"},{"key":"498_CR24","unstructured":"Harel, G. (2001). The development of mathematical induction as a proof scheme: A model for dnr-based instruction. In S. Campbell & R. Zaskis (Eds.), Learning and teaching number theory, journal of mathematical behavior. Citeseer."},{"key":"498_CR25","doi-asserted-by":"publisher","unstructured":"Hsu, S., Li, T. W., Zhang, Z., Fowler, M., Zilles, C., & Karahalios, K. (2021). Attitudes surrounding an imperfect ai autograder. In Proceedings of the 2021 CHI conference on human factors in computing systems. CHI \u201921. New York, NY, USA: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3411764.3445424","DOI":"10.1145\/3411764.3445424"},{"key":"498_CR26","doi-asserted-by":"publisher","DOI":"10.1007\/s40593-024-00453-7","author":"C Impey","year":"2025","unstructured":"Impey, C., Wenger, M., Garuda, N., Golchin, S., & Stamer, S. (2025). Using large language models for automated grading of student writing about science. International Journal of Artificial Intelligence in Education. https:\/\/doi.org\/10.1007\/s40593-024-00453-7","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"498_CR27","doi-asserted-by":"publisher","unstructured":"Jahnke, H. N., & Wambach, R. (2013). Understanding what a proof is: A classroom-based approach. ZDM, 45(3), 469\u2013482. https:\/\/doi.org\/10.1007\/s11858-013-0502-x. Accessed 18 Jan 2022.","DOI":"10.1007\/s11858-013-0502-x"},{"key":"498_CR28","doi-asserted-by":"publisher","unstructured":"Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R. S., Torralba, A., Urtasun, R., & Fidler, S. (2015). Skip-thought vectors. arXiv:1506.06726. https:\/\/doi.org\/10.48550\/arXiv.1506.06726. http:\/\/arxiv.org\/abs\/1506.06726. Accessed 26 Nov 2023.","DOI":"10.48550\/arXiv.1506.06726"},{"key":"498_CR29","doi-asserted-by":"crossref","unstructured":"Knuth, E. J. (2002). Secondary school mathematics teachers\u2019 conceptions of proof. Journal for Research in Mathematics Education, 33(5), 379\u2013405. Accessed 31 Dec 2024.","DOI":"10.2307\/4149959"},{"key":"498_CR30","doi-asserted-by":"publisher","unstructured":"Lan, A. S., Vats, D., Waters, A. E., & Baraniuk, R. G. (2015). Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Proceedings of the second (2015) ACM conference on learning @ scale. L@S \u201915 (pp. 167\u2013176). New York, NY, USA: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/2724660.2724664. https:\/\/dl.acm.org\/doi\/10.1145\/2724660.2724664. Accessed 15 Nov 2023.","DOI":"10.1145\/2724660.2724664"},{"issue":"2","key":"498_CR31","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1093\/analys\/anp002","volume":"69","author":"M Lange","year":"2009","unstructured":"Lange, M. (2009). Why proofs by mathematical induction are generally not explanatory. Analysis, 69(2), 203\u2013211. https:\/\/doi.org\/10.1093\/analys\/anp002","journal-title":"Analysis"},{"key":"498_CR32","doi-asserted-by":"publisher","unstructured":"Larsen, S., & Zandieh, M. (2008). Proofs and refutations in the undergraduate mathematics classroom. Educational Studies in Mathematics, 67(3), 205\u2013216. https:\/\/doi.org\/10.1007\/s10649-007-9106-0. Accessed 19 Jan 2022.","DOI":"10.1007\/s10649-007-9106-0"},{"key":"498_CR33","doi-asserted-by":"publisher","unstructured":"Lee, M. K. (2018). Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management. Big Data & Society, 5(1), 2053951718756684. https:\/\/doi.org\/10.1177\/2053951718756684. Publisher: SAGE Publications Ltd. Accessed 16 May 2024.","DOI":"10.1177\/2053951718756684"},{"key":"498_CR34","doi-asserted-by":"crossref","unstructured":"Lerner, S., Foster, S. R., & Griswold, W. G. (2015). Polymorphic blocks: Formalism-inspired ui for structured connectors. In Proceedings of the 33rd annual ACM conference on human factors in computing systems (pp. 3063\u20133072).","DOI":"10.1145\/2702123.2702302"},{"key":"498_CR35","doi-asserted-by":"publisher","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K\u00fcttler, H., Lewis, M., Yih, W.-t., Rockt\u00e4schel, T., Riedel, S., & Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv:2005.11401. https:\/\/doi.org\/10.48550\/arXiv.2005.11401. http:\/\/arxiv.org\/abs\/2005.11401. Accessed 28 May 2024.","DOI":"10.48550\/arXiv.2005.11401"},{"key":"498_CR36","doi-asserted-by":"publisher","unstructured":"Li, T. W., Hsu, S., Fowler, M., Zhang, Z., Zilles, C., & Karahalios, K. (2023). Am i wrong, or is the autograder wrong? effects of ai grading mistakes on learning. In Proceedings of the 2023 ACM conference on international computing education research - volume 1. ICER \u201923 (pp. 159\u2013176). New York, NY, USA: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3568813.3600124","DOI":"10.1145\/3568813.3600124"},{"key":"498_CR37","unstructured":"Mohler, M., Bunescu, R., & Mihalcea, R. (2011). Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In D. Lin, Y. Matsumoto, & R. Mihalcea (Eds.), Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 752\u2013762). Portland, Oregon, USA: Association for Computational Linguistics. https:\/\/aclanthology.org\/P11-1076\/. Accessed 26 Jan 2025."},{"key":"498_CR38","unstructured":"Movshovitz-Hadar, N. (1993). The false coin problem, mathematical induction and knowledge fragility. Journal of Mathematical Behavior, 12(3), 253\u2013268. Accessed 23 Dec 2024."},{"key":"498_CR39","unstructured":"Nielsen, R. D., Ward, W., Martin, J., & Palmer, M. (2008). Annotating students\u2019 understanding of science concepts. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the sixth international conference on language resources and evaluation (LREC\u201808). Marrakech, Morocco: European Language Resources Association (ELRA). https:\/\/aclanthology.org\/L08-1166\/. Accessed 26 Jan 2025."},{"key":"498_CR40","doi-asserted-by":"crossref","unstructured":"Norton, A., Arnold, R., Kokushkin, V., & Tiraphatna, M. (2022). Addressing the cognitive gap in mathematical induction. International Journal of Research in Undergraduate Mathematics Education, 1\u201327.","DOI":"10.1007\/s40753-022-00163-2"},{"key":"498_CR41","unstructured":"OpenAI (2023). Embeddings. https:\/\/platform.openai.com\/docs\/guides\/embeddings\/what-are-embeddings. Accessed January 2024."},{"key":"498_CR42","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems 32 (pp. 8024\u20138035). Vancuver, Canada: Curran Associates, Inc. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"498_CR43","unstructured":"Poulsen, S., Chen, H., Gertner, Y., Cosman, B., West, M., & Herman, G. L. (2023b). Measuring the impact of distractors on student learning gains while using proof blocks."},{"key":"498_CR44","doi-asserted-by":"crossref","unstructured":"Poulsen, S., Gertner, Y., Chen, H., Cosman, B., West, M., & Herman, G. L. (2024). Disentangling the learning gains from reading a book chapter and completing proof blocks problems. In Proceedings of the 55th ACM technical symposium on computer science education V. 1.","DOI":"10.1145\/3626252.3630831"},{"key":"498_CR45","doi-asserted-by":"crossref","unstructured":"Poulsen, S., Gertner, Y., Cosman, B., West, M., & Herman, G. L. (2023a) Efficiency of learning from proof blocks versus writing proofs. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (pp. 472\u2013478).","DOI":"10.1145\/3545945.3569797"},{"key":"498_CR46","doi-asserted-by":"publisher","unstructured":"Poulsen, S., Viswanathan, M., Herman, G. L., & West, M. (2022). Proof blocks: Autogradable scaffolding activities for learning to write proofs. In Proceedings of the 27th ACM conference on innovation and technology in computer science education (Vol. 1, pp. 428\u2013434). https:\/\/doi.org\/10.1145\/3502718.3524774. arXiv:2106.11032. http:\/\/arxiv.org\/abs\/2106.11032. Accessed 09 Sept 2023.","DOI":"10.1145\/3502718.3524774"},{"key":"498_CR47","doi-asserted-by":"publisher","unstructured":"Poulsen, S. (2024). Student proof by induction data set. https:\/\/doi.org\/10.7910\/DVN\/OTRLXF","DOI":"10.7910\/DVN\/OTRLXF"},{"key":"498_CR48","doi-asserted-by":"publisher","unstructured":"Shen, J. T., Yamashita, M., Prihar, E., Heffernan, N., Wu, X., Graff, B., & Lee, D. (2023). MathBERT: A pre-trained language model for general nlp tasks in mathematics education. arXiv:2106.07340. https:\/\/doi.org\/10.48550\/arXiv.2106.07340. http:\/\/arxiv.org\/abs\/2106.07340. Accessed 17 Aug 2023.","DOI":"10.48550\/arXiv.2106.07340"},{"issue":"1","key":"498_CR49","doi-asserted-by":"publisher","first-page":"153","DOI":"10.3102\/0034654307313795","volume":"78","author":"VJ Shute","year":"2008","unstructured":"Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153\u2013189. https:\/\/doi.org\/10.3102\/0034654307313795","journal-title":"Review of Educational Research"},{"key":"498_CR50","doi-asserted-by":"crossref","unstructured":"Stylianides, G. J., & Stylianides, A. J. (2009). Facilitating the transition from empirical arguments to proof. Journal for Research in Mathematics Education, 40(3), 314\u2013352. Publisher: National Council of Teachers of Mathematics. Accessed 18 Jan 2022.","DOI":"10.5951\/jresematheduc.40.3.0314"},{"key":"498_CR51","doi-asserted-by":"publisher","unstructured":"Stylianides, G. J., Sandefur, J., & Watson, A. (2016). Conditions for proving by mathematical induction to be explanatory. The Journal of Mathematical Behavior, 43, 20\u201334 . https:\/\/doi.org\/10.1016\/j.jmathb.2016.04.002. Accessed 23 Dec 2024.","DOI":"10.1016\/j.jmathb.2016.04.002"},{"key":"498_CR52","unstructured":"Stylianides, G., Stylianides, A., & Weber, K. (2017). Research on the teaching and learning of proof: Taking stock and moving forward. In J. Cai (Ed.), Compendium for research in mathematics education (pp. 237\u2013266). Reston, VA: National Council of Teachers of Mathematics. Chap. 10."},{"key":"498_CR53","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1007\/s10857-007-9034-z","volume":"10","author":"GJ Stylianides","year":"2007","unstructured":"Stylianides, G. J., Stylianides, A. J., & Philippou, G. N. (2007). Preservice teachers\u2019 knowledge of proof by mathematical induction. Journal of Mathematics Teacher Education, 10, 145\u2013166.","journal-title":"Journal of Mathematics Teacher Education"},{"key":"498_CR54","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P.S., Lachaux, M.-A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E. M., Subramanian, R., Tan, X. E., Tang, B., Taylor, R., Williams, A., Kuan, J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., & Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. http:\/\/arxiv.org\/abs\/2307.09288. Accessed 10 Jan 2024."},{"key":"498_CR55","doi-asserted-by":"publisher","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention is all you need. arXiv:1706.03762. https:\/\/doi.org\/10.48550\/arXiv.1706.03762. http:\/\/arxiv.org\/abs\/1706.03762. Accessed 26 Nov 2023.","DOI":"10.48550\/arXiv.1706.03762"},{"key":"498_CR56","doi-asserted-by":"crossref","unstructured":"West, M., Herman, G. L., & Zilles, C. (2015). Prairielearn: Mastery-based online problem solving with adaptive scoring and recommendations driven by machine learning. In 2015 ASEE annual conference & exposition (pp. 26\u20131238126123814). Seattle, Washington: ASEE Conferences. https:\/\/peer.asee.org\/24575","DOI":"10.18260\/p.24575"},{"key":"498_CR57","doi-asserted-by":"publisher","unstructured":"Wu, Y., Jiang, A. Q., Li, W., Rabe, M. N., Staats, C., Jamnik, M., & Szegedy, C. (2022). Autoformalization with large language models. arXiv:2205.12615. https:\/\/doi.org\/10.48550\/arXiv.2205.12615. http:\/\/arxiv.org\/abs\/2205.12615. Accessed 25 March 2024.","DOI":"10.48550\/arXiv.2205.12615"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00498-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-025-00498-2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00498-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:45Z","timestamp":1772647965000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-025-00498-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,14]]},"references-count":57,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["498"],"URL":"https:\/\/doi.org\/10.1007\/s40593-025-00498-2","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,14]]},"assertion":[{"value":"17 June 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 July 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}