{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T19:56:10Z","timestamp":1773518170858,"version":"3.50.1"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T00:00:00Z","timestamp":1772150400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2026,3,1]]},"abstract":"<jats:p>Formal reasoning has long been fundamental to various fields of computer science, including AI in its early days. In contrast, large language models (LLMs)\u2014the cornerstone of modern AI\u2014perform reasoning through autoregressive next-word prediction, without grounding their outputs in formal systems. This \u201cinformal\u201d approach can learn world knowledge and reasoning patterns from large-scale data without rigid formalization. It offers significantly greater flexibility than traditional formal reasoning and has achieved promising results on many benchmarks. However, LLMs heavily rely on data and do not guarantee the soundness of reasoning. In this article, we highlight recent efforts to integrate modern LLMs with formal methods, an approach that seeks to harness the strengths of both paradigms. Such integration has the potential to lead to major advancements in AI-driven mathematics, formal verification, and the verifiable generation of computer systems.<\/jats:p>","DOI":"10.1145\/3750038","type":"journal-article","created":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T21:57:02Z","timestamp":1770760622000},"page":"66-73","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification"],"prefix":"10.1145","volume":"69","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2777-612X","authenticated-orcid":false,"given":"Kaiyu","family":"Yang","sequence":"first","affiliation":[{"name":"Meta FAIR, New York, New York, United States"}]},{"given":"Gabriel","family":"Poesia","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, California, United States"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9858-5640","authenticated-orcid":false,"given":"Jingxuan","family":"He","sequence":"additional","affiliation":[{"name":"UC Berkeley, Berkeley, California, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9886-9542","authenticated-orcid":false,"given":"Wenda","family":"Li","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, United Kingdom"}]},{"given":"Kristin","family":"Lauter","sequence":"additional","affiliation":[{"name":"Meta FAIR, Seattle, Washington, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6859-1391","authenticated-orcid":false,"given":"Swarat","family":"Chaudhuri","sequence":"additional","affiliation":[{"name":"UT Austin, Computer Science, Austin, Texas, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9745-6802","authenticated-orcid":false,"given":"Dawn","family":"Song","sequence":"additional","affiliation":[{"name":"UC Berkeley, Berkeley, California, United States"}]}],"member":"320","published-online":{"date-parts":[[2026,2,27]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Azerbayev Z. et al. Llemma: An open language model for mathematics. In Intern. Conf. on Learning Representations (2024)."},{"key":"e_1_3_1_3_2","volume-title":"The Coq Proof Assistant Reference Manual: Version 6.1","author":"Barras B.","year":"1997","unstructured":"Barras, B. et al. The Coq Proof Assistant Reference Manual: Version 6.1. Ph.D. Dissertation, Inria (1997)."},{"key":"e_1_3_1_4_2","unstructured":"Blaauwbroek L. et al. Graph2Tac: Online representation learning of formal math concepts. In Intern. Conf. on Machine Learning (2024)."},{"key":"e_1_3_1_5_2","unstructured":"Buzzard K. Can AI do maths yet? thoughts from a mathematician. Xenaproject (Dec. 22 2024); https:\/\/xenaproject.wordpress.com\/2024\/12\/22\/can-ai-do-maths-yet-thoughts-from-a-mathematician\/"},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","unstructured":"de Moura L. et al . The Lean theorem prover (system description). In Intern. Conf. on Automated Deduction (2015).","DOI":"10.1007\/978-3-319-21401-6_26"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Ferdowsi K. et al. Validating AI-generated code with live programming. In Conf. on Human Factors in Computing Systems (2024).","DOI":"10.1145\/3613904.3642495"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"First E. Rabe M.N. Ringer T. and Brun Y. Baldur: Whole-proof generation and repair with large language models. In ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering (2023).","DOI":"10.1145\/3611643.3616243"},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","unstructured":"G\u00f6del K. \u00dcber formal unentscheidbare S\u00e4tze der Principia Mathematica und verwandter Systeme I. Monatshefte F\u00fcr Mathematik Und Physik (1931).","DOI":"10.1007\/BF01700692"},{"key":"e_1_3_1_10_2","unstructured":"Google DeepMind.\u00a0Google AI achieves silver-medal standard solving International Mathematical Olympiad problems (Jul. 25 2024); https:\/\/deepmind.google\/discover\/blog\/ai-solves-imo-problems-at-silver-medal-level\/"},{"key":"e_1_3_1_11_2","unstructured":"Howard W.A. The formulae-as-types notion of construction. In To HB Curry: Essays on Combinatory Logic Lambda Calculus and Formalism (1980)."},{"key":"e_1_3_1_12_2","unstructured":"Jiang A.Q. et al. Draft sketch and prove: Guiding formal theorem provers with informal proofs. In Intern. Conf. on Learning Representations (2023)."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Kaliszyk C. and Urban J. Lemma mining over HOL Light. In Intern. Conf. on Logic for Programming Artificial Intelligence and Reasoning (2013).","DOI":"10.1007\/978-3-642-45221-5_34"},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","unstructured":"Karpukhin V. et al. Dense passage retrieval for open-domain question answering. In Conf. on Empirical Methods in Natural Language Processing (2020).","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Klein G. et al. seL4: Formal verification of an OS kernel. In Symp. on Operating Systems Principles (2009).","DOI":"10.1145\/1629575.1629596"},{"key":"e_1_3_1_16_2","unstructured":"Lample G. et al. HyperTree proof search for neural theorem proving. In Neural Information Processing Systems (2022)."},{"key":"e_1_3_1_17_2","unstructured":"Lewis M. and Mitchell M. Using counterfactual tasks to evaluate the generality of analogical reasoning in large language models. In Proceedings of the Annual Meeting of the Cognitive Science Society (2024)."},{"key":"e_1_3_1_18_2","unstructured":"Li Z. et al. A survey on deep learning for theorem proving. In Conf. on Language Modeling (2024)."},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","unstructured":"Li Z. et al. Autoformalize mathematical statements by symbolic equivalence and semantic consistency. In Neural Information Processing Systems (2024).","DOI":"10.52202\/079017-1697"},{"key":"e_1_3_1_20_2","unstructured":"Lightman H. et al. Let\u2019s verify step by step. In Intern. Conf. on Learning Representations (2024)."},{"key":"e_1_3_1_21_2","unstructured":"Lohn E. and Welleck S. miniCodeProps: A minimal benchmark for proving code properties. arXiv preprint arXiv:2406.11915 (2024)."},{"key":"e_1_3_1_22_2","unstructured":"Lu J. et al. Process-driven autoformalization in Lean 4. arXiv preprint arXiv:2406.01940 (2024)."},{"key":"e_1_3_1_23_2","unstructured":"Mathlib community. The Lean mathematical library. In Certified Programs and Proofs (2020)."},{"key":"e_1_3_1_24_2","unstructured":"Murphy L. et al. Autoformalizing euclidean geometry. In Intern. Conf. on Machine Learning (2024)."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Newell A. and Simon H. The logic theory machine\u2013a complex information processing system. IRE Transactions on Information Theory (1956).","DOI":"10.1109\/TIT.1956.1056797"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45949-9"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Poesia G. Broman D. Haber N. and Goodman N.D. Learning formal mathematics from intrinsic motivation. In Neural Information Processing System (2024).","DOI":"10.52202\/079017-1362"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Poesia G. and Goodman N.D. Peano: Learning formal mathematical reasoning. Philosophical Transactions of the Royal Society A. (2023).","DOI":"10.1098\/rsta.2022.0044"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2471838"},{"key":"e_1_3_1_30_2","unstructured":"Song P. Yang K. and Anandkumar A. Lean copilot: Large language models as copilots for theorem proving in Lean. In Intern. Conf. on Neuro-symbolic Systems (2025)."},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","unstructured":"Sun C. Sheng Y. Padon O. and Barrett C. Clover: Closed-loop verifiable code generation. In Intern. Symp. on AI Verification (2024).","DOI":"10.1007\/978-3-031-65112-0_7"},{"key":"e_1_3_1_32_2","unstructured":"Thakur A. et al. CLEVER: A curated benchmark for formally verified code generation. arXiv preprint arXiv:2505.13938 (2025)."},{"key":"e_1_3_1_33_2","unstructured":"Thakur A. et al. An in-context learning agent for formal theorem-proving. In Conf. on Language Modeling (2024)."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06747-5"},{"key":"e_1_3_1_35_2","unstructured":"Turing A. On computable numbers with an application to the Entscheidungs problem. Proceedings of the London Mathematical Society (1936)."},{"key":"e_1_3_1_36_2","unstructured":"Wang H. et al. Proving theorems recursively. In Neural Information Processing Systems (2024)."},{"key":"e_1_3_1_37_2","unstructured":"Wu Y. et al. Autoformalization with large language models. In Neural Information Processing Systems (2022)."},{"key":"e_1_3_1_38_2","unstructured":"Xin H. et al. LEGO-prover: Neural theorem proving with growing libraries. In Intern. Conf. on Learning Representations (2023)."},{"key":"e_1_3_1_39_2","unstructured":"Yang K. et al. LeanDojo: Theorem proving with retrieval-augmented language models. In Neural Information Processing Systems (2023)."},{"key":"e_1_3_1_40_2","unstructured":"Ye Z. et al. VERINA: Benchmarking verifiable code generation. arXiv preprint arXiv:2505.23135 (2025)."},{"key":"e_1_3_1_41_2","unstructured":"Zhou J.P. et al. Don\u2019t trust: Verify\u2013grounding LLM quantitative reasoning with autoformalization. In Intern. Conf. on Learning Representations (2024)."}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3750038","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T15:08:38Z","timestamp":1773500918000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3750038"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,27]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3,1]]}},"alternative-id":["10.1145\/3750038"],"URL":"https:\/\/doi.org\/10.1145\/3750038","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"value":"0001-0782","type":"print"},{"value":"1557-7317","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,27]]},"assertion":[{"value":"2025-03-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}