{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T13:26:33Z","timestamp":1777728393267,"version":"3.51.4"},"reference-count":26,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2025,8,1]],"date-time":"2025-08-01T00:00:00Z","timestamp":1754006400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,8,1]],"date-time":"2025-08-01T00:00:00Z","timestamp":1754006400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Intelligenza Artificiale"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>We present an initial automated test to evaluate LLMs\u2019 capacity to perform inductive reasoning tasks. We use the GPT-3.5 and GPT-4 models to create a system which generates Python code as hypotheses for inductive reasoning to transform sequences of the One Dimensional Abstract Reasoning Corpus (1D-ARC) challenge. We experiment with three prompting techniques, namely standard prompting, Chain of Thought (CoT), and direct feedback. We provide results and an analysis of cost-to-success rate and benefit-cost ratio. Our best result is an overall 25% success rate with our CoT prompting on GPT-4, significantly surpassing the standard prompting approach. We assess the programming capabilities of the LLM by analysing the execution rate and errors of the generated code for inductive reasoning. We discuss potential avenues to improve our experiments, testing other strategies, and combining deductive reasoning with LLM-based inductive reasoning.<\/jats:p>","DOI":"10.1177\/17248035251363882","type":"journal-article","created":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T12:07:06Z","timestamp":1758715626000},"page":"102-115","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Evaluating Inductive Reasoning and Programming Capabilities of Large Language Models With The One-Dimensional Abstract Reasoning Corpus"],"prefix":"10.1177","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2004-6378","authenticated-orcid":false,"given":"C\u00e9dric","family":"Mesnage","sequence":"first","affiliation":[{"name":"Institute for Data Science and Artificial Intelligence, University of Exeter, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9332-2700","authenticated-orcid":false,"given":"Xiaoyang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Exeter, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6828-6891","authenticated-orcid":false,"given":"Hang","family":"Dong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Exeter, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4386-9745","authenticated-orcid":false,"family":"Aishwaryaprajna","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Exeter, UK"}]}],"member":"179","published-online":{"date-parts":[[2025,8,8]]},"reference":[{"key":"e_1_3_3_2_1","doi-asserted-by":"crossref","unstructured":"Bommarito I. I. M. Katz D. M. (2022). GPT takes the bar exam. arXiv preprint arXiv:2212.14402.","DOI":"10.2139\/ssrn.4314839"},{"key":"e_1_3_3_3_1","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown T.","year":"2020","unstructured":"Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., et al (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877\u20131901.","journal-title":"Advances in neural information processing systems"},{"key":"e_1_3_3_4_1","unstructured":"Chollet F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547."},{"key":"e_1_3_3_5_1","unstructured":"Chollet F. (2024). OpenAI o3 breakthrough high score on arc-agi-pub. https:\/\/arcprize.org\/blog\/oai-o3-pubbreakthrough. ARC Prize Foundation."},{"key":"e_1_3_3_6_1","doi-asserted-by":"crossref","unstructured":"H\u00e4ggstr\u00f6m O. (2021). Artificial general intelligence and the common sense argument. In Conference on philosophy and theory of artificial intelligence (pp. 155\u2013160). Springer.","DOI":"10.1007\/978-3-031-09153-7_12"},{"key":"e_1_3_3_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cogsys.2023.101155"},{"key":"e_1_3_3_8_1","doi-asserted-by":"crossref","unstructured":"Imani S. Du L. Shrivastava H. (2023). Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.","DOI":"10.18653\/v1\/2023.acl-industry.4"},{"key":"e_1_3_3_9_1","unstructured":"Kalyan A. Mohta A. Polozov O. Batra D. Jain P. Gulwani S. (2018). Neural-guided deductive search for real-time program synthesis from examples. In International conference on learning representations (pp. 1\u201315)."},{"key":"e_1_3_3_10_1","first-page":"22199","article-title":"Large language models are zero-shot reasoners","volume":"35","author":"Kojima T.","year":"2022","unstructured":"Kojima T., Gu S. S., Reid M., Matsuo Y., Iwasawa Y. (2022). Large language models are zero-shot reasoners. Advances in neural information processing systems, 35, 22199\u201322213.","journal-title":"Advances in neural information processing systems"},{"key":"e_1_3_3_11_1","first-page":"9459","volume-title":"Advances in neural information processing systems","author":"Lewis P.","year":"2020","unstructured":"Lewis P., Perez E., Piktus A., Petroni F., Karpukhin V., Goyal N., K\u00fcttler H., Lewis M., Yih Wt., Rockt\u00e4schel T., Riedel S., Kiela D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Larochelle H., Ranzato M., Hadsell R., Balcan M., Lin H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 9459\u20139474). Curran Associates, Inc."},{"key":"e_1_3_3_12_1","first-page":"21558","volume-title":"Advances in neural information processing systems","author":"Liu J.","year":"2023","unstructured":"Liu J., Xia C. S., Wang Y., Zhang L. (2023). Is your code generated by ChatGPT really correct? rigorous evaluation of large language models for code generation. In Oh A., Naumann T., Globerson A., Saenko K., Hardt M., Levine S. (Eds.), Advances in neural information processing systems (Vol. 36, pp. 21558\u201321572). Curran Associates, Inc."},{"key":"e_1_3_3_13_1","doi-asserted-by":"crossref","unstructured":"Mesnage C. Wang X. Dong H. et al (2025). Evaluating inductive reasoning capabilities of large language models with the one dimensional abstract reasoning corpus. In International workshop on hybrid models for coupling deductive and inductive reasoning (pp. 35\u201351). Springer.","DOI":"10.1007\/978-3-031-89366-7_3"},{"key":"e_1_3_3_14_1","unstructured":"Mirchandani S. Xia F. Florence P. Ichter B. Driess D. Arenas M. G. Rao K. Sadigh D. Zeng A. (2023). Large language models as general pattern machines. arXiv preprint arXiv:2307.04721."},{"key":"e_1_3_3_15_1","unstructured":"Morris M. R. Sohl-dickstein J. Fiedel N. Warkentin T. Dafoe A. Faust A. Farabet C. Legg S. (2023). Levels of agi: Operationalizing progress on the path to agi. arXiv preprint arXiv:2311.02462."},{"key":"e_1_3_3_16_1","unstructured":"OpenAI (2023) GPT-4 technical report."},{"key":"e_1_3_3_17_1","unstructured":"Touvron H. Lavril T. Izacard G. Martinet X. Lachaux M. A. Lacroix T. Rozi\u00e8re B. Goyal N. Hambro E. Azhar F. et al (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971."},{"key":"e_1_3_3_18_1","doi-asserted-by":"crossref","unstructured":"Wang B. Yue X. Sun H. (2023a). Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate. In Findings of the association for computational linguistics: EMNLP 2023 (pp. 11865\u201311881).","DOI":"10.18653\/v1\/2023.findings-emnlp.795"},{"key":"e_1_3_3_19_1","unstructured":"Wang K. Ren H. Zhou A. Lu Z. Luo S. Shi W. Zhang R. Song L. Zhan M. Li H. (2023b). Mathcoder: Seamless code integration in LLMs for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731."},{"key":"e_1_3_3_20_1","unstructured":"Wang R. Zelikman E. Poesia G. Pu Y. Haber N. Goodman N. D. (2023c). Hypothesis search: Inductive reasoning with language models."},{"key":"e_1_3_3_21_1","doi-asserted-by":"crossref","unstructured":"Wang S. Wei Z. Choi Y. Ren X. (2024). Can LLMs reason with rules? logic scaffolding for stress-testing and improving LLMs. arXiv preprint arXiv:2402.11442.","DOI":"10.18653\/v1\/2024.acl-long.406"},{"key":"e_1_3_3_22_1","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei J.","year":"2022","unstructured":"Wei J., Wang X., Schuurmans D., Bosma M., Xia F., Chi E., Le Q. V., Zhou D., et al (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824\u201324837.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_23_1","unstructured":"Xu Y. Li W. Vaezipoor P. Sanner S. Khalil E. B. (2023). LLMs and the abstraction and reasoning corpus: Successes failures and the importance of object-based representations."},{"key":"e_1_3_3_24_1","unstructured":"Yao S. Yu D. Zhao J. Shafran I. Griffiths T. L. Cao Y. Narasimhan K. (2023). Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601."},{"key":"e_1_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00632"},{"key":"e_1_3_3_26_1","article-title":"Large language models as commonsense knowledge for large-scale task planning","volume":"36","author":"Zhao Z.","year":"2024","unstructured":"Zhao Z., Lee W. S., Hsu D. (2024). Large language models as commonsense knowledge for large-scale task planning. Advances in Neural Information Processing Systems, 36, 31967\u201331987.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_27_1","unstructured":"Zhong T. Liu Z. Pan Y. Zhang Y. Zhou Y. Liang S. Wu Z. Lyu Y. Shu P. Yu X. et al. (2024). Evaluation of openai o1: Opportunities and challenges of agi. arXiv preprint arXiv:2409.18486."}],"container-title":["Intelligenza Artificiale"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/17248035251363882","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/17248035251363882","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/17248035251363882","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:51:57Z","timestamp":1777459917000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/17248035251363882"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":26,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.1177\/17248035251363882"],"URL":"https:\/\/doi.org\/10.1177\/17248035251363882","relation":{},"ISSN":["1724-8035","2211-0097"],"issn-type":[{"value":"1724-8035","type":"print"},{"value":"2211-0097","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8]]}}}