{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T00:32:38Z","timestamp":1776472358471,"version":"3.51.2"},"reference-count":98,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T00:00:00Z","timestamp":1769040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Future AI Research (FAIR) project of the National Recovery and Resilience Plan"},{"name":"European Union\u2013NextGenerationEU"},{"DOI":"10.13039\/501100001665","name":"French National Research Agency","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Large Language Models (LLMs) have demonstrated impressive capabilities in structured reasoning and symbolic tasks, with coding emerging as a particularly successful application. This progress has naturally motivated efforts to extend these models to mathematics, both in its traditional form, expressed through natural-style mathematical language, and in its formalized counterpart, expressed in a symbolic syntax suitable for automatic verification. Yet, despite apparent parallels between programming and proof construction, advances in formalized mathematics have proven significantly more challenging. This gap raises fundamental questions about the nature of reasoning in current LLM architectures, the role of supervision and feedback, and the extent to which such models maintain an internal notion of computational or deductive state. In this article, we review the current state-of-the-art in mathematical reasoning with LLMs, focusing on recent models and benchmarks. We explore three central issues at the intersection of machine learning and mathematical cognition: (i) the trade-offs between traditional and formalized mathematics as training and evaluation domains; (ii) the structural and methodological reasons why proof synthesis remains more brittle than code generation; and (iii) whether LLMs genuinely represent or merely emulate a notion of evolving logical state. Our goal is not to draw rigid distinctions but to clarify the present boundaries of these systems and outline promising directions for their extension.<\/jats:p>","DOI":"10.3390\/bdcc10010038","type":"journal-article","created":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T17:52:38Z","timestamp":1769190758000},"page":"38","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Thinking Machines: Mathematical Reasoning in the Age of LLMs"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9677-6350","authenticated-orcid":false,"given":"Andrea","family":"Asperti","sequence":"first","affiliation":[{"name":"Department of Informatics\u2014Science and Engineering (DISI), University of Bologna, Via Mura Anteo Zamboni 7, 40126 Bologna, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5335-6412","authenticated-orcid":false,"given":"Alberto","family":"Naibo","sequence":"additional","affiliation":[{"name":"Department of Philosophy, University Paris 1 Panth\u00e9on-Sorbonne, 17 Rue de la Sorbonne, 75995 Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4360-6016","authenticated-orcid":false,"given":"Claudio","family":"Sacerdoti Coen","sequence":"additional","affiliation":[{"name":"Department of Informatics\u2014Science and Engineering (DISI), University of Bologna, Via Mura Anteo Zamboni 7, 40126 Bologna, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1145\/3708519","article-title":"Automatic Programming: Large Language Models and Beyond","volume":"34","author":"Lyu","year":"2025","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1007\/s42979-025-04241-5","article-title":"Usage of Large Language Model for Code Generation Tasks: A Review","volume":"6","author":"Bistarelli","year":"2025","journal-title":"SN Comput. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Xia, C.S., Wei, Y., and Zhang, L. (2023, January 14\u201320). Automated Program Repair in the Era of Large Pre-trained Language Models. Proceedings of the 45th IEEE\/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia.","DOI":"10.1109\/ICSE48619.2023.00129"},{"key":"ref_4","unstructured":"Novikov, A., Vu, N., Eisenberger, M., Dupont, E., Huang, P., Wagner, A.Z., Shirobokov, S., Kozlovskii, B., Ruiz, F.J.R., and Mehrabian, A. (2025). AlphaEvolve: A coding agent for scientific and algorithmic discovery. arXiv."},{"key":"ref_5","unstructured":"Boye, J., and Mo\u00ebll, B. (2025). Large Language Models and Mathematical Reasoning Failures. arXiv."},{"key":"ref_6","unstructured":"Zhou, Y., Liu, H., Chen, Z., Tian, Y., and Chen, B. (2025, January 13\u201319). GSM-\u221e: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?. Proceedings of the Forty-Second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada."},{"key":"ref_7","unstructured":"Glazer, E., Erdil, E., Besiroglu, T., Chicharro, D., Chen, E., Gunning, A., Olsson, C.F., Denain, J., Ho, A., and de Oliveira Santos, E. (2024). FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI. arXiv."},{"key":"ref_8","first-page":"12","article-title":"Algorithm and Abstraction in Formal Mathematics","volume":"Volume 14749","author":"Buzzard","year":"2024","journal-title":"Mathematical Software\u2014ICMS 2024, Proceedings of the 8th International Conference, Durham, UK, 22\u201325 July 2024"},{"key":"ref_9","unstructured":"Kaufmann, T., Weng, P., Bengs, V., and H\u00fcllermeier, E. (2025). A Survey of Reinforcement Learning from Human Feedback. Trans. Mach. Learn. Res., Available online: https:\/\/openreview.net\/pdf?id=f7OkIurx4b."},{"key":"ref_10","unstructured":"DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., and Wang, P. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv."},{"key":"ref_11","unstructured":"Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Zhang, M., Li, Y.K., Wu, Y., and Guo, D. (2024). DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv."},{"key":"ref_12","unstructured":"Xin, H., Ren, Z.Z., Song, J., Shao, Z., Zhao, W., Wang, H., Liu, B., Zhang, L., Lu, X., and Du, Q. (2025, January 24\u201328). DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search. Proceedings of the Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore."},{"key":"ref_13","unstructured":"Duh, K., G\u00f3mez-Adorno, H., and Bethard, S. (2024). From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs After Instruction Tuning. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, 16\u201321 June 2024, Association for Computational Linguistics."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1016\/j.tics.2024.01.011","article-title":"Dissociating language and thought in large language models","volume":"28","author":"Mahowald","year":"2024","journal-title":"Trends Cogn. Sci."},{"key":"ref_15","first-page":"53728","article-title":"Direct preference optimization: Your language model is secretly a reward model","volume":"36","author":"Rafailov","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","unstructured":"Wang, Z., Bukharin, A., Delalleau, O., Egert, D., Shen, G., Zeng, J., Kuchaiev, O., and Dong, Y. (2025, January 24\u201328). HelpSteer2-Preference: Complementing Ratings with Preferences. Proceedings of the Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liu, Z., Chen, Y., Shoeybi, M., Catanzaro, B., and Ping, W. (2025). AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling. Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, 27 July\u20131 August 2025, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2025.findings-acl.206"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xu, H., Mao, X., Li, F., Wu, X., Chen, W., Zhang, W., and Luu, A.T. (2025). Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning. Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, 27 July\u20131 August 2025, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2025.findings-acl.1249"},{"key":"ref_19","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianaki, M., and Callison-Burch, C. (2023). Faithful Chain-of-Thought Reasoning. The 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP 2023\u2014Volume 1: Long Papers, Nusa Dua, Bali, 1\u20134 November 2023, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2023.ijcnlp-main.20"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Xu, J., Fei, H., Pan, L., Liu, Q., Lee, M., and Hsu, W. (2024). Faithful Logical Reasoning via Symbolic Chain-of-Thought. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, 11\u201316 August 2024, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2024.acl-long.720"},{"key":"ref_22","first-page":"34201","article-title":"Data selection for language models via importance resampling","volume":"36","author":"Xie","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","unstructured":"Zhang, T., Yu, T., Hashimoto, T., Lewis, M., Yih, W.T., Fried, D., and Wang, S. (2023, January 23\u201329). Coder reviewer reranking for code generation. Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., Yin, D., and Ren, Z. (2023, January 6\u201310). Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.923"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1162\/tacl_a_00660","article-title":"Automatically Correcting Large Language Models: Surveying the landscape of diverse automated correction strategies","volume":"12","author":"Pan","year":"2024","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_26","unstructured":"Gou, Z., Shao, Z., Gong, Y., Shen, Y., Yang, Y., Duan, N., and Chen, W. (2024, January 7\u201311). CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Jiang, Z., Xu, F.F., Gao, L., Sun, Z., Liu, Q., Dwivedi-Yu, J., Yang, Y., Callan, J., and Neubig, G. (2023, January 6\u201310). Active retrieval augmented generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"ref_28","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_29","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). Large Language Models are Zero-Shot Reasoners. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_30","first-page":"11809","article-title":"Tree of thoughts: Deliberate problem solving with large language models","volume":"36","author":"Yao","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., and Cobbe, K. (2024, January 7\u201311). Let\u2019s Verify Step by Step. Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria."},{"key":"ref_32","unstructured":"Vanschoren, J., and Yeung, S. (2021). Measuring Mathematical Problem Solving with the MATH Dataset. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, Virtual, 6\u201314 December 2021, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_33","unstructured":"Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (2023). LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. Advances in Neural Information Processing Systems 36, Proceedings of the Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10\u201316 December 2023, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_34","unstructured":"Goldberg, Y., Kozareva, Z., and Zhang, Y. (2022). LILA: A Unified Benchmark for Mathematical Reasoning. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 7\u201311 December 2022, Association for Computational Linguistics."},{"key":"ref_35","unstructured":"Muresan, S., Nakov, P., and Villavicencio, A. (2022). NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, 22\u201327 May 2022, Association for Computational Linguistics."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, M., Yin, F., and Liu, C. (2023, January 19\u201325). A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, Macao SAR, China.","DOI":"10.24963\/ijcai.2023\/376"},{"key":"ref_37","unstructured":"Duh, K., Gomez, H., and Bethard, S. (2024). GOLD: Geometry Problem Solver with Natural Language Description. Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16\u201321 June 2024, Association for Computational Linguistics (ACL)."},{"key":"ref_38","unstructured":"Che, W., Nabende, J., Shutova, E., and Pilehvar, M.T. (2025). MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning. Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, 27 July\u20131 August 2025, Association for Computational Linguistics."},{"key":"ref_39","unstructured":"Duan, C., Sun, K., Fang, R., Zhang, M., Feng, Y., Luo, Y., Liu, Y., Wang, K., Pei, P., and Cai, X. (2025). CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images. arXiv."},{"key":"ref_40","unstructured":"Shi, W., Yu, A., Fang, R., Ren, H., Wang, K., Zhou, A., Tian, C., Fu, X., Hu, Y., and Lu, Z. (2025). MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning. arXiv."},{"key":"ref_41","unstructured":"Zheng, K., Han, J.M., and Polu, S. (2022, January 25\u201329). miniF2F: A cross-system benchmark for formal Olympiad-level mathematics. Proceedings of the Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event."},{"key":"ref_42","first-page":"625","article-title":"The Lean 4 Theorem Prover and Programming Language","volume":"Volume 12699","author":"Platzer","year":"2021","journal-title":"Automated Deduction\u2014CADE 28, Proceedings of the 28th International Conference on Automated Deduction, Virtual Event, 12\u201315 July 2021"},{"key":"ref_43","unstructured":"Wang, H., Unsal, M., Lin, X., Baksys, M., Liu, J., Santos, M.D., Sung, F., Vinyes, M., Ying, Z., and Zhu, Z. (2025). Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning. arXiv."},{"key":"ref_44","unstructured":"Liu, H., Sun, J., Li, Z., and Yao, A.C. (2025). Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis. arXiv."},{"key":"ref_45","unstructured":"Zheng, C., Wang, H., Xie, E., Liu, Z., Sun, J., Xin, H., Shen, J., Li, Z., and Li, Y. (2024). Lyra: Orchestrating Dual Correction in Automated Theorem Proving. Trans. Mach. Learn. Res., Available online: https:\/\/openreview.net\/forum?id=Svt75kotzs."},{"key":"ref_46","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). HyperTree Proof Search for Neural Theorem Proving. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_47","unstructured":"Wang, H., Xin, H., Zheng, C., Liu, Z., Cao, Q., Huang, Y., Xiong, J., Shi, H., Xie, E., and Yin, J. (2024, January 7\u201311). LEGO-Prover: Neural Theorem Proving with Growing Libraries. Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria."},{"key":"ref_48","unstructured":"Song, P., Yang, K., and Anandkumar, A. (2024). Towards Large Language Models as Copilots for Theorem Proving in Lean. arXiv."},{"key":"ref_49","unstructured":"Petrov, I., Dekoninck, J., Baltadzhiev, L., Drencheva, M., Minchev, K., Balunovic, M., Jovanovic, N., and Vechev, M.T. (2025). Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad. arXiv."},{"key":"ref_50","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). Solving Quantitative Reasoning Problems with Language Models. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_51","unstructured":"Jiang, A.Q., Welleck, S., Zhou, J.P., Lacroix, T., Liu, J., Li, W., Jamnik, M., Lample, G., and Wu, Y. (2023, January 1\u20135). Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda."},{"key":"ref_52","unstructured":"Matuszwski, R., and Zalewska, A. (2007). The QED Manifesto Revisited. From Insight to Proof, Festschrift in Honour of Andrzej Trybulec, University of Bia\u0142ystok."},{"key":"ref_53","unstructured":"Voevodsky, V. (2014). The origins and motivations of the univalent foundations. The Institute Letter\u2014Institute for Advanced Study, Institute for Advanced Study."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1090\/bull\/1833","article-title":"Mathematical reasoning and the computer","volume":"61","author":"Buzzard","year":"2024","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1090\/bull\/1832","article-title":"Mathematics and the formal turn","volume":"61","author":"Avigad","year":"2024","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.1007\/s11858-024-01577-9","article-title":"Using the proof assistant Lean in undergraduate mathematics classrooms","volume":"56","author":"Hanna","year":"2024","journal-title":"ZDM\u2013Math. Educ."},{"key":"ref_57","unstructured":"Barwise, J. (1977). An introduction to first-order logic. Handbook of Mathematical Logic, Elsevier."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1093\/philmat\/7.1.5","article-title":"Why do we prove theorems?","volume":"7","author":"Rav","year":"1999","journal-title":"Philos. Math."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s11225-010-9280-4","article-title":"Informal proofs and mathematical rigour","volume":"96","year":"2010","journal-title":"Stud. Log."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"e2","DOI":"10.1017\/fmp.2017.1","article-title":"A formal proof of the Kepler conjecture","volume":"Volume 5","author":"Hales","year":"2017","journal-title":"Forum of Mathematics, Pi"},{"key":"ref_61","first-page":"333","article-title":"The Four Colour Theorem: Engineering of a Formal Proof","volume":"Volume 5081","author":"Kapur","year":"2007","journal-title":"Computer Mathematics, Proceedings of the 8th Asian Symposium, ASCM 2007, Singapore, 15\u201317 December 2007"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Gonthier, G., Asperti, A., Avigad, J., Bertot, Y., Cohen, C., Garillot, F., Roux, S.L., Mahboubi, A., O\u2019Connor, R., and Biha, S.O. (2013). A Machine-Checked Proof of the Odd Order Theorem. Interactive Theorem Proving, Proceedings of the 4th International Conference, ITP 2013, Rennes, France, 22\u201326 July 2013, Springer.","DOI":"10.1007\/978-3-642-39634-2_14"},{"key":"ref_63","first-page":"36:1","article-title":"Fermat\u2019s Last Theorem for Regular Primes (Short Paper)","volume":"Volume 268","author":"Naumowicz","year":"2023","journal-title":"14th International Conference on Interactive Theorem Proving, ITP 2023, Bia\u0142ystok, Poland, 31 July\u20134 August 2023"},{"key":"ref_64","first-page":"141","article-title":"Checking Natural Language Proofs","volume":"Volume 310","author":"Lusk","year":"1988","journal-title":"9th International Conference on Automated Deduction, Argonne, Illinois, USA, 23\u201326 May 1988, Proceedings"},{"key":"ref_65","unstructured":"Simon, D. (1990). Checking Number Theory Proofs in Natural Language. [Ph.D. Thesis, University of Texas at Austin]."},{"key":"ref_66","unstructured":"Zinn, C.W. (2004). Understanding Informal Mathematical Discourse. [Ph.D. Thesis, Universit\u00e4t Erlangen-N\u00fcrnberg]."},{"key":"ref_67","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). Autoformalization with Large Language Models. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_68","unstructured":"Lample, G., and Charton, F. (2020, January 26\u201330). Deep Learning For Symbolic Mathematics. Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia."},{"key":"ref_69","unstructured":"Zhou, J.P., Staats, C., Li, W., Szegedy, C., Weinberger, K.Q., and Wu, Y. (2024, January 7\u201311). Don\u2019t Trust: Verify\u2014Grounding LLM Quantitative Reasoning with Autoformalization. Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria."},{"key":"ref_70","first-page":"227","article-title":"Learning to Parse on Aligned Corpora (Rough Diamond)","volume":"Volume 9236","author":"Urban","year":"2015","journal-title":"Interactive Theorem Proving, Proceedings of the 6th International Conference, ITP 2015, Nanjing, China, 24\u201327 August 2015"},{"key":"ref_71","first-page":"12","article-title":"Automating Formalization by Statistical and Semantic Parsing of Mathematics","volume":"Volume 10499","year":"2017","journal-title":"Interactive Theorem Proving, Proceedings of the 8th International Conference, ITP 2017, Bras\u00edlia, Brazil, 26\u201329 September 2017"},{"key":"ref_72","first-page":"255","article-title":"First Experiments with Neural Translation of Informal to Formal Mathematics","volume":"Volume 11006","author":"Rabe","year":"2018","journal-title":"Intelligent Computer Mathematics, Proceedings of the 11th International Conference, CICM 2018, Hagenberg, Austria, 13\u201317 August 2018"},{"key":"ref_73","first-page":"19","article-title":"Automatic translation in formalized mathematics","volume":"5","author":"Bancerek","year":"2006","journal-title":"Mech. Math. Its Appl."},{"key":"ref_74","unstructured":"Zhang, L., Valentino, M., and Freitas, A. (2025). Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions. arXiv."},{"key":"ref_75","unstructured":"Chen, G., Wu, J., Chen, X., Zhao, W.X., Song, R., Li, C., Fan, K., Liu, D., and Liao, M. (2025). ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization. arXiv."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Cunningham, G., Bunescu, R.C., and Juedes, D. (2022, January 8). Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs. Proceedings of the 1st Workshop on Mathematical Natural Language Processing\u2014MathNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.mathnlp-1.4"},{"key":"ref_77","unstructured":"Pierce, B.C., de Amorim, A.A., Casinghino, C., Gaboardi, M., Greenberg, M., Hri\u0163cu, C., Sj\u00f6berg, V., Tolmach, A., and Yorgey, B. (2018). Programming Language Foundations, University of Pennsylvania."},{"key":"ref_78","first-page":"378","article-title":"Formal Proof Sketches","volume":"Volume 3085","author":"Berardi","year":"2003","journal-title":"Types for Proofs and Programs, Proceedings of the International Workshop, TYPES 2003, Torino, Italy, 30 April\u20134 May 2003, Revised Selected Papers"},{"key":"ref_79","first-page":"1","article-title":"Three Years of Experience with Sledgehammer, a Practical Link between Automatic and Interactive Theorem Provers","volume":"Volume 9","author":"Schmidt","year":"2010","journal-title":"PAAR-2010: Proceedings of the 2nd Workshop on Practical Aspects of Automated Reasoning, Edinburgh, Scotland, UK, 14 July 2010"},{"key":"ref_80","unstructured":"Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (2022). Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers. Advances in Neural Information Processing Systems 35, Proceedings of the Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November\u20139 December 2022, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Wiedijk, F. (2006). The Seventeen Provers of the World, Foreword by Dana S. Scott, Springer.","DOI":"10.1007\/11542384"},{"key":"ref_82","unstructured":"S\u00f8rensen, M.H., and Urzyczyn, P. (2006). Lectures on the Curry-Howard Isomorphism, Volume 149 (Studies in Logic and the Foundations of Mathematics), Elsevier Science Inc."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1017\/S0960129509990041","article-title":"Social processes, program verification and all that","volume":"19","author":"Asperti","year":"2009","journal-title":"Math. Struct. Comput. Sci."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Bertot, Y., and Cast\u00e9ran, P. (2004). Proof by Reflection. Interactive Theorem Proving and Program Development: Coq\u2019Art: The Calculus of Inductive Constructions, Springer.","DOI":"10.1007\/978-3-662-07964-5"},{"key":"ref_85","first-page":"95","article-title":"An introduction to small scale reflection in Coq","volume":"3","author":"Gonthier","year":"2010","journal-title":"J. Formaliz. Reason."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1021935419355","article-title":"A Comparison of Mizar and Isar","volume":"29","author":"Wenzel","year":"2002","journal-title":"J. Autom. Reason."},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Asperti, A. (2012). Proof, Message and Certificate. Intelligent Computer Mathematics, Proceedings of the 11th International Conference, AISC 2012, 19th Symposium, Calculemus 2012, 5th International Workshop, DML 2012, 11th International Conference, MKM 2012, Systems and Projects, Held as Part of CICM 2012, Bremen, Germany, 8\u201313 July 2012, Springer.","DOI":"10.1007\/978-3-642-31374-5_2"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Siekmann, J.H., and Wrightson, G. (1983). AUTOMATH, a Language for Mathematics. Automation of Reasoning: 2: Classical Papers on Computational Logic 1967\u20131970, Springer.","DOI":"10.1007\/978-3-642-81955-1"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Asperti, A., and Sacerdoti Coen, C. (2010). Some Considerations on the Usability of Interactive Provers. Intelligent Computer Mathematics, Proceedings of the 10th International Conference, AISC 2010, 17th Symposium, Calculemus 2010, and 9th International Conference, MKM 2010, Paris, France, 5\u201310 July 2010, Springer.","DOI":"10.1007\/978-3-642-14128-7_13"},{"key":"ref_90","unstructured":"Ku, L., Martins, A., and Srikumar, V. (2024). Forward-Backward Reasoning in Large Language Models for Mathematical Verification. Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand, Virtual Meeting, 11\u201316 August 2024, Association for Computational Linguistics."},{"key":"ref_91","unstructured":"Li, B.Z., Guo, Z.C., and Andreas, J. (2025, January 13\u201319). (How) Do Language Models Track State?. Proceedings of the Forty-Second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada."},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Du, W., Jin, D., Fu, J., and Jin, Z. (2025). Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, 27 July\u20131 August 2025, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2025.acl-long.668"},{"key":"ref_93","unstructured":"Herrmann, V., Csord\u00e1s, R., and Schmidhuber, J. (2025, January 13\u201319). Measuring In-Context Computation Complexity via Hidden State Prediction. Proceedings of the Forty-Second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada."},{"key":"ref_94","unstructured":"Rezaee, K., Camacho-Collados, J., and Pilehvar, M.T. (2025). Exploring State Tracking Capabilities of Large Language Models. arXiv."},{"key":"ref_95","unstructured":"Turpin, M., Michael, J., Perez, E., and Bowman, S.R. (2023). Language Models Don\u2019t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. Advances in Neural Information Processing Systems 36, Proceedings of the Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10\u201316 December 2023, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_96","unstructured":"Xiong, W., Zhang, H., Ye, C., Chen, L., Jiang, N., and Zhang, T. (2025). Self-rewarding correction for mathematical reasoning. arXiv."},{"key":"ref_97","unstructured":"Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36, Proceedings of the Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10\u201316 December 2023, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_98","unstructured":"Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., and Yang, Y. (2023, January 10\u201316). SELF-REFINE: Iterative refinement with self-feedback. Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS\u201923, New Orleans, LA, USA."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/10\/1\/38\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T05:12:38Z","timestamp":1769231558000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/10\/1\/38"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,22]]},"references-count":98,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["bdcc10010038"],"URL":"https:\/\/doi.org\/10.3390\/bdcc10010038","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,22]]}}}