{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:40:45Z","timestamp":1777657245736,"version":"3.51.4"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"name":"Accelerate Foundation Models Research Initiative"},{"name":"Development of Artificial Complex Intelligence for Conceptually Understanding and Inferring Like Human","award":["RS-2023-00216011"],"award-info":[{"award-number":["RS-2023-00216011"]}]},{"name":"Testbed for Abstraction and Reasoning","award":["RS-2023-00240062"],"award-info":[{"award-number":["RS-2023-00240062"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to evaluate the inference and contextual understanding abilities of LLMs in a process-centric manner, focusing on three key components from the Language of Thought Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity. Our carefully designed experiments reveal that while LLMs demonstrate some inference capabilities, they still significantly lag behind human-level reasoning in these three aspects. The main contribution of this article lies in introducing the LoTH perspective, which provides a method for evaluating the reasoning process that conventional results-oriented approaches fail to capture, thereby offering new insights into the development of human-level reasoning in artificial intelligence systems.<\/jats:p>","DOI":"10.1145\/3712701","type":"journal-article","created":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T12:20:48Z","timestamp":1737375648000},"page":"1-52","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-1133-0848","authenticated-orcid":false,"given":"Seungpil","family":"Lee","sequence":"first","affiliation":[{"name":"Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9215-0123","authenticated-orcid":false,"given":"Woochang","family":"Sim","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1587-1268","authenticated-orcid":false,"given":"Donghyeon","family":"Shin","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8982-1031","authenticated-orcid":false,"given":"Wongyu","family":"Seo","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4323-5684","authenticated-orcid":false,"given":"Jiwon","family":"Park","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4070-6927","authenticated-orcid":false,"given":"Seokki","family":"Lee","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4697-2360","authenticated-orcid":false,"given":"Sanha","family":"Hwang","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3328-5757","authenticated-orcid":false,"given":"Sejin","family":"Kim","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9687-2409","authenticated-orcid":false,"given":"Sundong","family":"Kim","sequence":"additional","affiliation":[{"name":"AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2025,11,24]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"3731","volume-title":"NeurIPS","author":"Acquaviva Samuel","year":"2022","unstructured":"Samuel Acquaviva, Yewen Pu, Marta Kryven, Theodoros Sechopoulos, Catherine Wong, Gabrielle Ecanow, Maxwell Nye, Michael Tessler, and Joshua B. Tenenbaum. 2022. Communicating natural programs to humans and machines. In NeurIPS, 3731\u20133743."},{"key":"e_1_3_2_3_2","unstructured":"Ekin Aky\u00fcrek Mehul Damani Linlu Qiu Han Guo Yoon Kim and Jacob Andreas. 2024. The surprising effectiveness of test-time training for abstract reasoning. arXiv:2411.07279. Retrieved from https:\/\/arxiv.org\/abs\/2411.07279"},{"key":"e_1_3_2_4_2","unstructured":"Zeyuan Allen-Zhu and Yuanzhi Li. 2023. Physics of language models: Part 3.2 Knowledge manipulation. arXiv:2309.14402. Retrieved form https:\/\/arxiv.org\/abs\/2309.14402"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i16.29720"},{"key":"e_1_3_2_6_2","first-page":"8","volume-title":"IJCAI Workshop","author":"Biran Or","year":"2017","unstructured":"Or Biran and Courtenay Cotton. 2017. Explanation and justification in machine learning: A survey. In IJCAI Workshop, 8\u201313."},{"key":"e_1_3_2_7_2","unstructured":"Alexey Borsky. 2021. ARC-Game. Retrieved from https:\/\/github.com\/volotat\/ARC-Game"},{"key":"e_1_3_2_8_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg et\u00a0al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. arXiv:2303.12712. Retrieved from https:\/\/arxiv.org\/abs\/2303.12712"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3641289"},{"key":"e_1_3_2_10_2","unstructured":"Fran\u00e7ois Chollet. 2019. On the measure of intelligence. arXiv:1911.01547. Retrieved from https:\/\/arxiv.org\/abs\/1911.01547"},{"key":"e_1_3_2_11_2","unstructured":"Fran\u00e7ois Chollet. 2024. OpenAI o3 breakthrough high score on ARC-AGI-pub. Retrieved from https:\/\/arcprize.org\/blog\/oai-o3-pub-breakthrough"},{"key":"e_1_3_2_12_2","volume-title":"Technical report","author":"Chollet Francois","year":"2024","unstructured":"Francois Chollet, Mike Knoop, Gregory Kamradt, and Bryan Landers. 2024. ARC prize 2024: Technical report. arXiv:2412.04604. Retrieved from https:\/\/arxiv.org\/abs\/2412.04604"},{"key":"e_1_3_2_13_2","first-page":"4302","volume-title":"NeurIPS","author":"Christiano Paul F.","year":"2017","unstructured":"Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In NeurIPS, 4302\u20134310."},{"key":"e_1_3_2_14_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano et\u00a0al. 2021. Training verifiers to solve math word problems. arXiv:2110.14168. Retreived from https:\/\/arxiv.org\/abs\/2110.14168"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.261"},{"key":"e_1_3_2_16_2","unstructured":"Gon\u00e7alo Hora de Carvalho Robert Pollice and Oscar Knap. 2024. Show don\u2019t tell: Evaluating large language models beyond textual understanding with childplay. arXiv:2407.11068. Retrieved from https:\/\/arxiv.org\/abs\/2407.11068"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.95"},{"key":"e_1_3_2_18_2","first-page":"2368","volume-title":"NAACL","author":"Dua Dheeru","year":"2019","unstructured":"Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In NAACL, 2368\u20132378."},{"key":"e_1_3_2_19_2","first-page":"70293","volume-title":"NeurIPS","author":"Dziri Nouha","year":"2023","unstructured":"Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, et\u00a0al. 2023. Faith and fate: Limits of transformers on compositionality. In NeurIPS, 70293\u201370332."},{"key":"e_1_3_2_20_2","volume-title":"The Language of Thought","author":"Fodor Jerry A.","year":"1975","unstructured":"Jerry A. Fodor. 1975. The Language of Thought. Vol. 5. Harvard University Press."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/0010-0277(88)90031-5"},{"key":"e_1_3_2_22_2","first-page":"10764","volume-title":"ICML","author":"Gao Luyu","year":"2023","unstructured":"Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. PAL: Program-aided language models. In ICML, 10764\u201310799."},{"key":"e_1_3_2_23_2","volume-title":"Frames of Mind: The Theory of Multiple Intelligences","author":"Gardner Howard E.","year":"2011","unstructured":"Howard E. Gardner. 2011. Frames of Mind: The Theory of Multiple Intelligences. Basic Books."},{"key":"e_1_3_2_24_2","first-page":"6270","volume-title":"IJCAI","author":"Gendron Ga\u00ebl","year":"2024","unstructured":"Ga\u00ebl Gendron, Qiming Bao, Michael Witbrock, and Gillian Dobbie. 2024. Large language models are not strong abstract reasoners. In IJCAI, 6270\u20136278."},{"key":"e_1_3_2_25_2","volume-title":"ICLR","author":"Gurnee Wes","year":"2024","unstructured":"Wes Gurnee and Max Tegmark. 2024. Language models represent space and time. In ICLR."},{"key":"e_1_3_2_26_2","unstructured":"Michael Hodel. 2024. Addressing the abstraction and reasoning corpus via procedural example generation. arXiv:2404.07353. Retrieved from https:\/\/arxiv.org\/abs\/2404.07353"},{"key":"e_1_3_2_27_2","first-page":"1049","volume-title":"ACL Findings","author":"Huang Jie","year":"2023","unstructured":"Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards reasoning in LLMs: A survey. In ACL Findings, 1049\u20131065."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11674"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.149"},{"key":"e_1_3_2_30_2","first-page":"2471","volume-title":"CogSci","author":"Johnson Aysja","year":"2021","unstructured":"Aysja Johnson, Wai Keen Vong, Brenden M. Lake, and Todd M. Gureckis. 2021. Fast and flexible: Human program induction in abstract reasoning tasks. In CogSci, 2471\u20132477."},{"key":"e_1_3_2_31_2","volume-title":"NeurIPS Workshop on nCSI","author":"Kim Subin","year":"2022","unstructured":"Subin Kim, Prin Phunyaphibarn, Donghyun Ahn, and Sundong Kim. 2022. Playgrounds for abstraction and reasoning. In NeurIPS Workshop on nCSI."},{"key":"e_1_3_2_32_2","first-page":"21487","volume-title":"NeurIPS","author":"Koh Jing Yu","year":"2024","unstructured":"Jing Yu Koh, Daniel Fried, and Russ R. Salakhutdinov. 2024. Generating images with multimodal language models. In NeurIPS, 21487\u201321506."},{"key":"e_1_3_2_33_2","unstructured":"Lab42. 2024. ARC-PRIZE Competition. Retrieved from https:\/\/arcprize.org\/"},{"key":"e_1_3_2_34_2","first-page":"2873","volume-title":"ICML","author":"Lake Brenden","year":"2018","unstructured":"Brenden Lake and Marco Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML. PMLR, 2873\u20132882."},{"key":"e_1_3_2_35_2","first-page":"3843","volume-title":"NeurIPS","author":"Lewkowycz Aitor","year":"2022","unstructured":"Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et\u00a0al. 2022. Solving quantitative reasoning problems with language models. In NeurIPS, 3843\u20133857."},{"key":"e_1_3_2_36_2","volume-title":"Combining induction and transduction for abstract reasoning","author":"Li Wen-Ding","year":"2024","unstructured":"Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb Woo, Spencer M. Dunn, Hao Tang, Michelangelo Naim, Dat Nguyen, et\u00a0al. 2024. Combining induction and transduction for abstract reasoning. arXiv:2411.02272. Retrieved from https:\/\/arxiv.org\/abs\/2411.02272"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.576"},{"key":"e_1_3_2_38_2","unstructured":"Mushui Liu Yuhang Ma Xinfeng Zhang Yang Zhen Zeng Zhao Zhipeng Hu Bai Liu and Changjie Fan. 2024. LLM4GEN: Leveraging semantic representation of LLMs for text-to-image generation. arXiv:2407.00737. Retrieved from https:\/\/arxiv.org\/abs\/2407.00737"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i03.5681"},{"key":"e_1_3_2_40_2","volume-title":"ICLR","author":"Lu Pan","year":"2024","unstructured":"Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. 2024. MathVista: evaluating math reasoning in visual contexts with GPT-4V, Bard, and other large multimodal models. In ICLR."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.naacl-long.39"},{"key":"e_1_3_2_42_2","volume-title":"ICLR","author":"Ma Xiaojian","year":"2023","unstructured":"Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, and Siyuan Huang. 2023. SQA3D: Situated question answering in 3D scenes. In ICLR."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02478259"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.11.011"},{"key":"e_1_3_2_45_2","volume-title":"CoRL","author":"Mirchandani Suvir","year":"2023","unstructured":"Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, and Andy Zeng. 2023. Large language models as general pattern machines. CoRL."},{"key":"e_1_3_2_46_2","unstructured":"Arseny Moskvichev Victor Vikram Odouard and Melanie Mitchell. 2023. The ConceptARC benchmark: Evaluating understanding and generalization in the ARC domain. TMLR."},{"key":"e_1_3_2_47_2","first-page":"16468","volume-title":"NeurIPS","author":"Nie Weili","year":"2020","unstructured":"Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, and Anima Anandkumar. 2020. Bongard-Logo: A new benchmark for human-level concept learning and reasoning. In NeurIPS, 16468\u201316480."},{"key":"e_1_3_2_48_2","first-page":"27730","volume-title":"NeurIPS","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et\u00a0al. 2022. Training language models to follow instructions with Human feedback. In NeurIPS, 27730\u201327744."},{"key":"e_1_3_2_49_2","volume-title":"ICML Workshop","author":"Park Jaehyun","year":"2023","unstructured":"Jaehyun Park, Jaegyun Im, Sanha Hwang, Mintaek Lim, Sabina Ualibekova, Sejin Kim, and Sundong Kim. 2023. Unraveling the ARC puzzle: Mimicking human solutions with object-centric decision transformer. ICML Workshop."},{"key":"e_1_3_2_50_2","volume-title":"ICLR","author":"Qiu Linlu","year":"2024","unstructured":"Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, et\u00a0al. 2024. Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement. In ICLR."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1487"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1037\/h0042519"},{"key":"e_1_3_2_53_2","volume-title":"Artificial Intelligence: A Modern Approach","author":"Russell Stuart J.","year":"1995","unstructured":"Stuart J. Russell and Peter Norvig. 1995. Artificial Intelligence: A Modern Approach. Pearson."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.392"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01519"},{"key":"e_1_3_2_56_2","volume-title":"Theoretical and experimental practices","author":"Sinha Sania","year":"2024","unstructured":"Sania Sinha, Tanawan Premsri, and Parisa Kordjamshidi. 2024. A survey on compositional learning of AI models: Theoretical and experimental practices. TMLR."},{"key":"e_1_3_2_57_2","first-page":"4149","volume-title":"NAACL","author":"Talmor Alon","year":"2019","unstructured":"Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In NAACL, 4149\u20134158."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.501"},{"key":"e_1_3_2_59_2","first-page":"236","article-title":"Computing machinery and intelligence","volume":"59","author":"Turing Alan","year":"1950","unstructured":"Alan Turing. 1950. Computing machinery and intelligence. Mind 59, 236 (1950), 433\u2013460.","journal-title":"Mind"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.826"},{"key":"e_1_3_2_61_2","first-page":"38975","volume-title":"NeurIPS","author":"Valmeekam Karthik","year":"2024","unstructured":"Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2024. PlanBench: An extensible benchmark for evaluating large language models on planning and reasoning about change. In NeurIPS, 38975\u201338987."},{"key":"e_1_3_2_62_2","first-page":"21","volume-title":"Connection Science","volume":"16","author":"Velde Frank van der","year":"2004","unstructured":"Frank van der Velde, Gwendid T van der Voort van der Kleij, and Marc de Kamps. 2004. Lack of combinatorial productivity in language processing with simple recurrent networks. Connection Science 16, 1 (2004), 21\u201346."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.153"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-5813"},{"key":"e_1_3_2_65_2","volume-title":"ICLR","author":"Wang Ruocheng","year":"2024","unstructured":"Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, and Noah D. Goodman. 2024. Hypothesis search: Inductive reasoning with language models. In ICLR."},{"key":"e_1_3_2_66_2","volume-title":"ICLR","author":"Wang Xuezhi","year":"2023","unstructured":"Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-consistency improves chain of thought reasoning in language models. In ICLR."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-naacl.252"},{"key":"e_1_3_2_68_2","first-page":"24824","volume-title":"NeurIPS","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 24824\u201324837."},{"key":"e_1_3_2_69_2","volume-title":"ICLR","author":"West Peter","year":"2024","unstructured":"Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, et\u00a0al. 2024. The generative AI paradox: \u201cWhat it can create, it may not understand\u201d. In ICLR."},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.2307\/3822945"},{"key":"e_1_3_2_71_2","unstructured":"Johan Sokrates Wind. 2020. ARC-Solution. Retrieved from https:\/\/github.com\/top-quarks\/ARC-solution"},{"key":"e_1_3_2_72_2","volume-title":"NeurIPS","author":"Wu Bo","year":"2021","unstructured":"Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, and Chuang Gan. 2021. STAR: A benchmark for situated reasoning in Real-world videos. In NeurIPS."},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i4.25527"},{"key":"e_1_3_2_74_2","volume-title":"and the importance of object-based representations","author":"Xu Yudong","year":"2023","unstructured":"Yudong Xu, Wenhao Li, Pashootan Vaezipoor, Scott Sanner, and Elias B. Khalil. 2023. LLMs and the abstraction and reasoning corpus: Successes, failures, and the importance of object-based representations. TMLR."},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-022-00583-4"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.naacl-srw.3"},{"key":"e_1_3_2_77_2","first-page":"11809","volume-title":"NeurIPS","author":"Yao Shunyu","year":"2023","unstructured":"Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. In NeurIPS, 11809\u201311822."},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00546"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i25.34876"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.197"},{"key":"e_1_3_2_81_2","volume-title":"ICLR","author":"Zhou Denny","year":"2023","unstructured":"Denny Zhou, Nathanael Sch\u00e4rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et\u00a0al. 2023. Least-to-most prompting enables complex reasoning in large language models. In ICLR."},{"key":"e_1_3_2_82_2","unstructured":"Hattie Zhou Azade Nova Hugo Larochelle Aaron Courville Behnam Neyshabur and Hanie Sedghi. 2022. Teaching algorithmic reasoning via in-context learning. arXiv:2211.09066. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:2211.09066"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3712701","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T15:07:48Z","timestamp":1763996868000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712701"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,24]]},"references-count":81,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3712701"],"URL":"https:\/\/doi.org\/10.1145\/3712701","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,24]]},"assertion":[{"value":"2024-03-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}