{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T15:19:44Z","timestamp":1778858384866,"version":"3.51.4"},"reference-count":360,"publisher":"Association for Computing Machinery (ACM)","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"abstract":"<jats:p>Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, e.g. Large Language Models (LLMs), and contribute to the development of AGI.<\/jats:p>","DOI":"10.1145\/3729218","type":"journal-article","created":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T10:57:06Z","timestamp":1744369026000},"update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-8356-7505","authenticated-orcid":false,"given":"Jiankai","family":"Sun","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6652-2026","authenticated-orcid":false,"given":"Chuanyang","family":"Zheng","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6890-1049","authenticated-orcid":false,"given":"Enze","family":"Xie","sequence":"additional","affiliation":[{"name":"The University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6385-6082","authenticated-orcid":false,"given":"Zhengying","family":"Liu","sequence":"additional","affiliation":[{"name":"Noah Ark's Lab,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9057-745X","authenticated-orcid":false,"given":"Ruihang","family":"Chu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4166-3428","authenticated-orcid":false,"given":"Jianing","family":"Qiu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1279-6782","authenticated-orcid":false,"given":"Jiaqi","family":"Xu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6556-8359","authenticated-orcid":false,"given":"Mingyu","family":"Ding","sequence":"additional","affiliation":[{"name":"The University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9110-5534","authenticated-orcid":false,"given":"Hongyang","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai AI Lab,  Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7886-439X","authenticated-orcid":false,"given":"Mengzhe","family":"Geng","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2305-6181","authenticated-orcid":false,"given":"Yue","family":"Wu","sequence":"additional","affiliation":[{"name":"Noah Ark's Lab,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2418-3134","authenticated-orcid":false,"given":"Wenhai","family":"Wang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1288-6657","authenticated-orcid":false,"given":"Junsong","family":"Chen","sequence":"additional","affiliation":[{"name":"Dalian University of Technology,  Dalian, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0620-8271","authenticated-orcid":false,"given":"Zhangyue","family":"Yin","sequence":"additional","affiliation":[{"name":"Fudan University,  Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0432-5510","authenticated-orcid":false,"given":"Xiaozhe","family":"Ren","sequence":"additional","affiliation":[{"name":"Noah Ark's Lab,  Hong Kong China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4494-843X","authenticated-orcid":false,"given":"Jie","family":"Fu","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9559-6941","authenticated-orcid":false,"given":"Junxian","family":"He","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9405-519X","authenticated-orcid":false,"given":"Yuan","family":"Wu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4608-5778","authenticated-orcid":false,"given":"Qi","family":"Liu","sequence":"additional","affiliation":[{"name":"The University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1831-9952","authenticated-orcid":false,"given":"Xihui","family":"Liu","sequence":"additional","affiliation":[{"name":"The University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3664-6722","authenticated-orcid":false,"given":"Yu","family":"Li","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2261-9122","authenticated-orcid":false,"given":"Hao","family":"Dong","sequence":"additional","affiliation":[{"name":"Peking University,  Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0019-2570","authenticated-orcid":false,"given":"Yu","family":"Cheng","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9809-3430","authenticated-orcid":false,"given":"Ming","family":"Zhang","sequence":"additional","affiliation":[{"name":"Peking University,  Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3055-5034","authenticated-orcid":false,"given":"Pheng Ann","family":"Heng","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6785-0785","authenticated-orcid":false,"given":"Jifeng","family":"Dai","sequence":"additional","affiliation":[{"name":"Tsinghua University,  Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6685-7950","authenticated-orcid":false,"given":"Ping","family":"Luo","sequence":"additional","affiliation":[{"name":"The University of Hong Kong,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4888-4445","authenticated-orcid":false,"given":"Jingdong","family":"Wang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology,  Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9777-9676","authenticated-orcid":false,"given":"Ji-Rong","family":"Wen","sequence":"additional","affiliation":[{"name":"Renmin University of China,  Beijing China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7163-5247","authenticated-orcid":false,"given":"Xipeng","family":"Qiu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8401-282X","authenticated-orcid":false,"given":"Yike","family":"Guo","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6016-6465","authenticated-orcid":false,"given":"Hui","family":"Xiong","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus,  Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7000-1792","authenticated-orcid":false,"given":"Qun","family":"Liu","sequence":"additional","affiliation":[{"name":"Noah's Ark Lab,  Hong Kong United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8492-3069","authenticated-orcid":false,"given":"Zhenguo","family":"Li","sequence":"additional","affiliation":[{"name":"Noah Ark's Lab,  Hong Kong Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,4,11]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"et\u00a0al","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia\u00a0Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et\u00a0al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774(2023)."},{"key":"e_1_2_1_2_1","volume-title":"Let\u2019s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs","author":"Aggarwal Pranjal","unstructured":"Pranjal Aggarwal, Aman Madaan, Yiming Yang, and Mausam. 2023. Let\u2019s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs. In EMNLP. Association for Computational Linguistics, Singapore, 12375\u201312396."},{"key":"e_1_2_1_3_1","volume-title":"Not As I Say: Grounding Language in Robotic Affordances. In Conf. on Robot Learning.","author":"\u00a0al Michael Ahn","year":"2022","unstructured":"Michael Ahn et\u00a0al. 2022. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Conf. on Robot Learning."},{"key":"e_1_2_1_4_1","first-page":"23716","article-title":"Flamingo: a visual language model for few-shot learning","volume":"35","author":"Alayrac Jean-Baptiste","year":"2022","unstructured":"Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et\u00a0al. 2022. Flamingo: a visual language model for few-shot learning. NeurIPS 35(2022), 23716\u201323736.","journal-title":"NeurIPS"},{"key":"e_1_2_1_5_1","volume-title":"Gesture in reasoning: An embodied perspective","author":"Alibali W","unstructured":"Martha\u00a0W Alibali, Rebecca Boncoddo, and Autumn\u00a0B Hostetter. 2014. Gesture in reasoning: An embodied perspective. In The Routledge handbook of embodied cognition. Routledge, 150\u2013159."},{"key":"e_1_2_1_6_1","unstructured":"Yuvanesh Anand Zach Nussbaum Brandon Duderstadt Benjamin Schmidt and Andriy Mulyar. 2023. GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. https:\/\/github.com\/nomic-ai\/gpt4all."},{"key":"e_1_2_1_7_1","unstructured":"Vamsi Aribandi Yi Tay Tal Schuster Jinfeng Rao Huaixiu\u00a0Steven Zheng Sanket\u00a0Vaibhav Mehta Honglei Zhuang Vinh\u00a0Q. Tran Dara Bahri Jianmo Ni Jai Gupta Kai Hui Sebastian Ruder and Donald Metzler. 2022. ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning. In ICLR."},{"key":"e_1_2_1_8_1","volume-title":"PROST: Physical Reasoning about Objects through Space and Time. In Findings of the Association for Computational Linguistics: ACL-IJCNLP","author":"Aroca-Ouellette St\u00e9phane","year":"2021","unstructured":"St\u00e9phane Aroca-Ouellette, Cory Paik, Alessandro Roncone, and Katharina Kann. 2021. PROST: Physical Reasoning about Objects through Space and Time. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 4597\u20134608."},{"key":"e_1_2_1_9_1","unstructured":"Abi Aryan Aakash\u00a0Kumar Nain Andrew McMahon Lucas\u00a0Augusto Meyer and Harpreet\u00a0Singh Sahota. 2023. The Costly Dilemma: Generalization Evaluation and Cost-Optimal Deployment of Large Language Models. arxiv:2308.08061 \u00a0[cs.CL]"},{"key":"e_1_2_1_10_1","volume-title":"et\u00a0al","author":"Austin Jacob","year":"2021","unstructured":"Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et\u00a0al. 2021. Program synthesis with large language models. arXiv preprint arXiv:2108.07732(2021)."},{"key":"e_1_2_1_11_1","volume-title":"Llemma: An Open Language Model for Mathematics. In ICLR.","author":"Azerbayev Zhangir","year":"2024","unstructured":"Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco\u00a0Dos Santos, Stephen\u00a0Marcus McAleer, Albert\u00a0Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. 2024. Llemma: An Open Language Model for Mathematics. In ICLR."},{"key":"e_1_2_1_12_1","first-page":"12449","article-title":"wav2vec 2.0: A framework for self-supervised learning of speech representations","volume":"33","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. NeurIPS 33(2020), 12449\u201312460.","journal-title":"NeurIPS"},{"key":"e_1_2_1_13_1","volume-title":"Constitutional AI: Harmlessness from AI Feedback. arxiv:2212.08073 \u00a0[cs.CL]","author":"\u00a0al Yuntao Bai","year":"2022","unstructured":"Yuntao Bai et\u00a0al. 2022. Constitutional AI: Harmlessness from AI Feedback. arxiv:2212.08073 \u00a0[cs.CL]"},{"key":"e_1_2_1_14_1","unstructured":"Michiel\u00a0A. Bakker Martin\u00a0J Chadwick Hannah Sheahan Michael\u00a0Henry Tessler Lucy Campbell-Gillingham Jan Balaguer Nat McAleese Amelia Glaese John Aslanides Matthew Botvinick and Christopher Summerfield. 2022. Fine-tuning language models to find agreement among humans with diverse preferences. In NeurIPS."},{"key":"e_1_2_1_15_1","volume-title":"et\u00a0al","author":"Barras Bruno","year":"1997","unstructured":"Bruno Barras, Samuel Boutin, Cristina Cornes, Judica\u00ebl Courant, Jean-Christophe Filliatre, Eduardo Gimenez, Hugo Herbelin, Gerard Huet, Cesar Munoz, Chetan Murthy, et\u00a0al. 1997. The Coq proof assistant reference manual: Version 6.1. Ph.\u00a0D. Dissertation. Inria."},{"key":"e_1_2_1_16_1","volume-title":"et\u00a0al","author":"Bear M","year":"2021","unstructured":"Daniel\u00a0M Bear, Elias Wang, Damian Mrowca, Felix\u00a0J Binder, Hsiao-Yu\u00a0Fish Tung, RT Pramod, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan-Yun Sun, et\u00a0al. 2021. Physion: Evaluating physical prediction from vision in humans and machines. In NeurIPS. 18102\u201318112."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1571-0661(05)82598-8"},{"key":"e_1_2_1_18_1","volume-title":"\u201cA is B","author":"Berglund Lukas","unstructured":"Lukas Berglund, Meg Tong, Maximilian Kaufmann, Mikita Balesni, Asa\u00a0Cooper Stickland, Tomasz Korbak, and Owain Evans. 2024. The Reversal Curse: LLMs trained on \u201cA is B\u201d fail to learn \u201cB is A\u201d. In ICLR."},{"key":"e_1_2_1_19_1","volume-title":"et\u00a0al","author":"Betker James","year":"2023","unstructured":"James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, et\u00a0al. 2023. Improving image generation with better captions. Computer Science. https:\/\/cdn. openai. com\/papers\/dall-e-3. pdf (2023)."},{"key":"e_1_2_1_20_1","volume-title":"Accurate medium-range global weather forecasting with 3D neural networks. Nature","author":"Bi Kaifeng","year":"2023","unstructured":"Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. 2023. Accurate medium-range global weather forecasting with 3D neural networks. Nature (2023), 1\u20136."},{"key":"e_1_2_1_21_1","volume-title":"et\u00a0al","author":"Bisk Yonatan","year":"2020","unstructured":"Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et\u00a0al. 2020. Piqa: Reasoning about physical commonsense in natural language. In AAAI, Vol.\u00a0 34. 7432\u20137439."},{"key":"e_1_2_1_22_1","unstructured":"Rishi Bommasani et\u00a0al. 2021. On the Opportunities and Risks of Foundation Models. arxiv:2108.07258 \u00a0[cs.LG]"},{"key":"e_1_2_1_23_1","volume-title":"et\u00a0al","author":"Brohan Anthony","year":"2023","unstructured":"Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et\u00a0al. 2023. Rt-1: Robotics transformer for real-world control at scale. RSS (2023)."},{"key":"e_1_2_1_24_1","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared\u00a0D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. NeurIPS 33(2020), 1877\u20131901.","journal-title":"NeurIPS"},{"key":"e_1_2_1_25_1","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin\u00a0Tat Lee Yuanzhi Li Scott Lundberg Harsha Nori Hamid Palangi Marco\u00a0Tulio Ribeiro and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arxiv:2303.12712 \u00a0[cs.CL]"},{"key":"e_1_2_1_26_1","volume-title":"Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390(2019).","author":"Burgess P","year":"2019","unstructured":"Christopher\u00a0P Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, and Alexander Lerchner. 2019. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390(2019)."},{"key":"e_1_2_1_27_1","unstructured":"Minwoo Byeon Beomhee Park Haecheon Kim Sungjun Lee Woonhyuk Baek and Saehoon Kim. 2022. Coyo-700m: Image-text pair dataset."},{"key":"e_1_2_1_28_1","volume-title":"Go Beyond Plain Fine-Tuning: Improving Pretrained Models for Social Commonsense. In 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1028\u20131035","author":"Chang Ting-Yun","year":"2021","unstructured":"Ting-Yun Chang, Yang Liu, Karthik Gopalakrishnan, Behnam Hedayatnia, Pei Zhou, and Dilek Hakkani-T\u00fcr. 2021. Go Beyond Plain Fine-Tuning: Improving Pretrained Models for Social Commonsense. In 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1028\u20131035."},{"key":"e_1_2_1_29_1","unstructured":"Chaochao Chen Xiaohua Feng Jun Zhou Jianwei Yin and Xiaolin Zheng. 2023. Federated Large Language Model: A Position Paper. arxiv:2307.08925 \u00a0[cs.LG]"},{"key":"e_1_2_1_30_1","unstructured":"Canyu Chen and Kai Shu. 2024. Can LLM-Generated Misinformation Be Detected?. In ICLR."},{"key":"e_1_2_1_31_1","volume-title":"AutoAgents: A Framework for Automatic Agent Generation. IJCAI","author":"Chen Guangyao","year":"2024","unstructured":"Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, B\u00f6rje\u00a0F Karlsson, Jie Fu, and Yemin Shi. 2024. AutoAgents: A Framework for Automatic Agent Generation. IJCAI (2024)."},{"key":"e_1_2_1_32_1","volume-title":"UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression","author":"Chen Jiaqi","year":"2022","unstructured":"Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, and Xiaodan Liang. 2022. UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression. In EMNLP. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3313\u20133323. https:\/\/aclanthology.org\/2022.emnlp-main.218"},{"key":"e_1_2_1_33_1","volume-title":"GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning. ACL","author":"Chen Jiaqi","year":"2021","unstructured":"Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric\u00a0P Xing, and Liang Lin. 2021. GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning. ACL (2021)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Junsong Chen Jincheng YU Chongjian GE Lewei Yao Enze Xie Zhongdao Wang James Kwok Ping Luo Huchuan Lu and Zhenguo Li. 2024. PixArt-$\\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis. In ICLR.","DOI":"10.1007\/978-3-031-73411-3_5"},{"key":"e_1_2_1_35_1","volume-title":"Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et\u00a0al.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de\u00a0Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et\u00a0al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2022.3188113"},{"key":"e_1_2_1_37_1","unstructured":"Shouyuan Chen Sherman Wong Liangjian Chen and Yuandong Tian. 2023. Extending Context Window of Large Language Models via Positional Interpolation. arxiv:2306.15595 \u00a0[cs.CL]"},{"key":"e_1_2_1_38_1","volume-title":"Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. TMLR","author":"Chen Wenhu","year":"2023","unstructured":"Wenhu Chen, Xueguang Ma, Xinyi Wang, and William\u00a0W. Cohen. 2023. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. TMLR (2023)."},{"key":"e_1_2_1_39_1","unstructured":"Yixin Chen Shuai Zhang Boran Han and Jiaya Jia. 2023. Lightweight In-Context Tuning for Multimodal Unified Models. arxiv:2310.05109 \u00a0[cs.CV]"},{"key":"e_1_2_1_40_1","unstructured":"Zitian Chen Mingyu Ding Yikang Shen Wei Zhan Masayoshi Tomizuka Erik Learned-Miller and Chuang Gan. 2023. An efficient general-purpose modular vision model via multi-task heterogeneous training. arXiv preprint arXiv:2306.17165(2023)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Zitian Chen Yikang Shen Mingyu Ding Zhenfang Chen Hengshuang Zhao Erik\u00a0G Learned-Miller and Chuang Gan. 2023. Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners. In CVPR. 11828\u201311837.","DOI":"10.1109\/CVPR52729.2023.01138"},{"key":"e_1_2_1_42_1","unstructured":"Zhenfang Chen Kexin Yi Yunzhu Li Mingyu Ding Antonio Torralba Joshua\u00a0B. Tenenbaum and Chuang Gan. 2022. ComPhy: Compositional Physical Reasoning of Objects and Events from Videos. In ICLR."},{"key":"e_1_2_1_43_1","volume-title":"Findings of EMNLP","author":"Chen Zhipeng","year":"2023","unstructured":"Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Xin Zhao, and Ji-Rong Wen. 2023. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models. In Findings of EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 14777\u201314790."},{"key":"e_1_2_1_44_1","volume-title":"Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/","author":"Chiang Wei-Lin","year":"2023","unstructured":"Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph\u00a0E. Gonzalez, Ion Stoica, and Eric\u00a0P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/"},{"key":"e_1_2_1_45_1","volume-title":"Icm-3d: Instantiated category modeling for 3d instance segmentation","author":"Chu Ruihang","year":"2021","unstructured":"Ruihang Chu, Yukang Chen, Tao Kong, Lu Qi, and Lei Li. 2021. Icm-3d: Instantiated category modeling for 3d instance segmentation. IEEE Robotics and Automation Letters(2021)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Ruihang Chu Zhengzhe Liu Xiaoqing Ye Xiao Tan Xiaojuan Qi Chi-Wing Fu and Jiaya Jia. 2023. Command-Driven Articulated Object Understanding and Manipulation. In CVPR. 8813\u20138823.","DOI":"10.1109\/CVPR52729.2023.00851"},{"key":"e_1_2_1_47_1","unstructured":"Ruihang Chu Enze Xie Shentong Mo Zhenguo Li Matthias Nie\u00dfner Chi-Wing Fu and Jiaya Jia. 2023. DiffComplete: Diffusion-based Generative 3D Shape Completion. In NeurIPS."},{"key":"e_1_2_1_48_1","volume-title":"et\u00a0al","author":"Chung Hyung\u00a0Won","year":"2024","unstructured":"Hyung\u00a0Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et\u00a0al. 2024. Scaling instruction-finetuned language models. 25, 70 (2024), 1\u201353."},{"key":"e_1_2_1_49_1","volume-title":"et\u00a0al","author":"Cobbe Karl","year":"2021","unstructured":"Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et\u00a0al. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168(2021)."},{"key":"e_1_2_1_50_1","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv preprint arXiv:2110.14168(2021)."},{"key":"e_1_2_1_51_1","unstructured":"Together Computer. 2023. RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset. https:\/\/github.com\/togethercomputer\/RedPajama-Data"},{"key":"e_1_2_1_52_1","volume-title":"et\u00a0al","author":"Conover Mike","year":"2023","unstructured":"Mike Conover, Matt Hayes, Ankit Mathur, Xiangrui Meng, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, et\u00a0al. 2023. Free dolly: Introducing the world\u2019s first truly open instruction-tuned llm."},{"key":"e_1_2_1_53_1","unstructured":"Antonia Creswell Murray Shanahan and Irina Higgins. 2023. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning. In ICLR."},{"key":"e_1_2_1_54_1","volume-title":"Inductive logic programming at 30. Machine Learning","author":"Cropper Andrew","year":"2022","unstructured":"Andrew Cropper, Sebastijan Duman\u010di\u0107, Richard Evans, and Stephen\u00a0H Muggleton. 2022. Inductive logic programming at 30. Machine Learning (2022), 1\u201326."},{"key":"e_1_2_1_55_1","unstructured":"Kahneman Daniel. 2017. Thinking fast and slow."},{"key":"e_1_2_1_56_1","volume-title":"Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25","author":"de Moura Leonardo","year":"2015","unstructured":"Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris Van\u00a0Doorn, and Jakob von Raumer. 2015. The Lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25. Springer, 378\u2013388."},{"key":"e_1_2_1_57_1","volume-title":"Statistical relational learning. Encyclopedia of Machine Learning(2010)","author":"De\u00a0Raedt Luc","unstructured":"Luc De\u00a0Raedt and Kristian Kersting. 2010. Statistical relational learning. Encyclopedia of Machine Learning(2010)."},{"key":"e_1_2_1_58_1","volume-title":"2009 IEEE conference on computer vision and pattern recognition. Ieee, 248\u2013255","author":"Deng Jia","year":"2009","unstructured":"Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248\u2013255."},{"key":"e_1_2_1_59_1","unstructured":"Karan Desai Gaurav Kaul Zubin\u00a0Trivadi Aysola and Justin Johnson. 2021. RedCaps: Web-curated image-text data created by the people for the people. In NeurIPS Datasets and Benchmarks Track."},{"key":"e_1_2_1_60_1","volume-title":"Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures. arXiv preprint arXiv:2012.08508 1","author":"Ding David","year":"2020","unstructured":"David Ding, Felix Hill, Adam Santoro, and Matt Botvinick. 2020. Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures. arXiv preprint arXiv:2012.08508 1 (2020)."},{"key":"e_1_2_1_61_1","unstructured":"Jiayu Ding Shuming Ma Li Dong Xingxing Zhang Shaohan Huang Wenhui Wang and Furu Wei. 2023. LongNet: Scaling Transformers to 1 000 000 000 Tokens. In ICLR."},{"key":"e_1_2_1_62_1","first-page":"887","article-title":"Dynamic visual reasoning by learning differentiable physics models from video and language","volume":"34","author":"Ding Mingyu","year":"2021","unstructured":"Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Josh Tenenbaum, and Chuang Gan. 2021. Dynamic visual reasoning by learning differentiable physics models from video and language. NeurIPS 34(2021), 887\u2013899.","journal-title":"NeurIPS"},{"key":"e_1_2_1_63_1","volume-title":"Conf. on Robot Learning. PMLR, 1743\u20131754","author":"Ding Mingyu","year":"2023","unstructured":"Mingyu Ding, Yan Xu, Zhenfang Chen, David\u00a0Daniel Cox, Ping Luo, Joshua\u00a0B Tenenbaum, and Chuang Gan. 2023. Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In Conf. on Robot Learning. PMLR, 1743\u20131754."},{"key":"e_1_2_1_64_1","doi-asserted-by":"crossref","unstructured":"Ning Ding Yulin Chen Bokai Xu Yujia Qin Shengding Hu Zhiyuan Liu Maosong Sun and Bowen Zhou. 2023. Enhancing Chat Language Models by Scaling High-quality Instructional Conversations. In EMNLP.","DOI":"10.18653\/v1\/2023.emnlp-main.183"},{"key":"e_1_2_1_65_1","volume-title":"RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment. TMLR","author":"Dong Hanze","year":"2023","unstructured":"Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, KaShun SHUM, and Tong Zhang. 2023. RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment. TMLR (2023)."},{"key":"e_1_2_1_66_1","unstructured":"Danny Driess Fei Xia Mehdi S.\u00a0M. Sajjadi Corey Lynch Aakanksha Chowdhery Brian Ichter Ayzaan Wahid Jonathan Tompson Quan Vuong Tianhe Yu Wenlong Huang Yevgen Chebotar Pierre Sermanet Daniel Duckworth Sergey Levine Vincent Vanhoucke Karol Hausman Marc Toussaint Klaus Greff Andy Zeng Igor Mordatch and Pete Florence. 2023. PaLM-E: An Embodied Multimodal Language Model. In arXiv preprint arXiv:2303.03378."},{"key":"e_1_2_1_67_1","volume-title":"et\u00a0al","author":"Du Nan","year":"2022","unstructured":"Nan Du, Yanping Huang, Andrew\u00a0M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams\u00a0Wei Yu, Orhan Firat, et\u00a0al. 2022. Glam: Efficient scaling of language models with mixture-of-experts. In ICML. PMLR, 5547\u20135569."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-emnlp.179"},{"key":"e_1_2_1_69_1","volume-title":"et\u00a0al","author":"Espeholt Lasse","year":"2022","unstructured":"Lasse Espeholt, Shreya Agrawal, Casper S\u00f8nderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Rob Carver, Marcin Andrychowicz, Jason Hickey, et\u00a0al. 2022. Deep learning for twelve hour precipitation forecasts. Nature communications 13, 1 (2022), 1\u201310."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_71_1","first-page":"5232","article-title":"Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity","volume":"23","author":"Fedus William","year":"2022","unstructured":"William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. JMLR 23, 1 (2022), 5232\u20135270.","journal-title":"JMLR"},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","first-page":"227","DOI":"10.5840\/gfpj200425215","article-title":"Wittgenstein on philosophy of logic and mathematics","volume":"25","author":"Floyd Juliet","year":"2004","unstructured":"Juliet Floyd. 2004. Wittgenstein on philosophy of logic and mathematics. Graduate Faculty Philosophy Journal 25, 2 (2004), 227\u2013287.","journal-title":"Graduate Faculty Philosophy Journal"},{"key":"e_1_2_1_73_1","doi-asserted-by":"crossref","first-page":"380","DOI":"10.3390\/encyclopedia3010024","article-title":"Tokenization in the Theory of Knowledge","volume":"3","author":"Friedman Robert","year":"2023","unstructured":"Robert Friedman. 2023. Tokenization in the Theory of Knowledge. Encyclopedia 3, 1 (2023), 380\u2013386.","journal-title":"Encyclopedia"},{"key":"e_1_2_1_74_1","unstructured":"Daniel\u00a0Y Fu Tri Dao Khaled\u00a0Kamal Saab Armin\u00a0W Thomas Atri Rudra and Christopher Re. 2023. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. In ICLR."},{"key":"e_1_2_1_75_1","unstructured":"Yao Fu Hao Peng Litu Ou Ashish Sabharwal and Tushar Khot. 2023. Specializing Smaller Language Models towards Multi-Step Reasoning. In ICML. PMLR 10421\u201310430."},{"key":"e_1_2_1_76_1","volume-title":"Complexity-based prompting for multi-step reasoning. ICLR","author":"Fu Yao","year":"2022","unstructured":"Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. 2022. Complexity-based prompting for multi-step reasoning. ICLR (2022)."},{"key":"e_1_2_1_77_1","volume-title":"et\u00a0al","author":"Gadre Samir\u00a0Yitzhak","year":"2023","unstructured":"Samir\u00a0Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, et\u00a0al. 2023. DataComp: In search of the next generation of multimodal datasets. NeurIPS Datasets and Benchmarks(2023)."},{"key":"e_1_2_1_78_1","unstructured":"Difei Gao Lei Ji Luowei Zhou Kevin\u00a0Qinghong Lin Joya Chen Zihan Fan and Mike\u00a0Zheng Shou. 2023. AssistGPT: A General Multi-modal Assistant that can Plan Execute Inspect and Learn. arxiv:2306.08640 \u00a0[cs.CV]"},{"key":"e_1_2_1_79_1","volume-title":"et\u00a0al","author":"Gao Leo","year":"2020","unstructured":"Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et\u00a0al. 2020. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027(2020)."},{"key":"e_1_2_1_80_1","volume-title":"Pal: Program-aided language models. In ICML. PMLR, 10764\u201310799.","author":"Gao Luyu","year":"2023","unstructured":"Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Pal: Program-aided language models. In ICML. PMLR, 10764\u201310799."},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10817-020-09580-x"},{"key":"e_1_2_1_82_1","volume-title":"OpenAGI: When LLM Meets Domain Experts. NeurIPS","author":"Ge Yingqiang","year":"2023","unstructured":"Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, and Yongfeng Zhang. 2023. OpenAGI: When LLM Meets Domain Experts. NeurIPS (2023)."},{"key":"e_1_2_1_83_1","volume-title":"Large Language Models Are Not Strong Abstract Reasoners. IJCAI","author":"Gendron Ga\u00ebl","year":"2024","unstructured":"Ga\u00ebl Gendron, Qiming Bao, Michael Witbrock, and Gillian Dobbie. 2024. Large Language Models Are Not Strong Abstract Reasoners. IJCAI (2024)."},{"key":"e_1_2_1_84_1","volume-title":"Koala: A Dialogue Model for Academic Research. Blog post. https:\/\/bair.berkeley.edu\/blog\/2023\/04\/03\/koala\/","author":"Geng Xinyang","year":"2023","unstructured":"Xinyang Geng, Arnav Gudibande, Hao Liu, Eric Wallace, Pieter Abbeel, Sergey Levine, and Dawn Song. 2023. Koala: A Dialogue Model for Academic Research. Blog post. https:\/\/bair.berkeley.edu\/blog\/2023\/04\/03\/koala\/"},{"key":"e_1_2_1_85_1","volume-title":"Automated Planning: theory and practice","author":"Ghallab Malik","unstructured":"Malik Ghallab, Dana Nau, and Paolo Traverso. 2004. Automated Planning: theory and practice. Elsevier."},{"key":"e_1_2_1_86_1","volume-title":"Imagebind: One embedding space to bind them all. In CVPR. 15180\u201315190.","author":"Girdhar Rohit","year":"2023","unstructured":"Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan\u00a0Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. Imagebind: One embedding space to bind them all. In CVPR. 15180\u201315190."},{"key":"e_1_2_1_87_1","volume-title":"Cater: A diagnostic dataset for compositional actions and temporal reasoning. In ICLR.","author":"Girdhar Rohit","year":"2020","unstructured":"Rohit Girdhar and Deva Ramanan. 2020. Cater: A diagnostic dataset for compositional actions and temporal reasoning. In ICLR."},{"key":"e_1_2_1_88_1","volume-title":"Emily de\u00a0Oliveira Santos, et\u00a0al","author":"Glazer Elliot","year":"2024","unstructured":"Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline\u00a0Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de\u00a0Oliveira Santos, et\u00a0al. 2024. Frontiermath: A benchmark for evaluating advanced mathematical reasoning in ai. arXiv preprint arXiv:2411.04872(2024)."},{"key":"e_1_2_1_89_1","unstructured":"Significant gravitas\/auto gpt. 2023. An experimental open-source attempt to make gpt-4 fully autonomou. arxiv:2305.16291 \u00a0[cs.AI]"},{"key":"e_1_2_1_90_1","volume-title":"Efficiently modeling long sequences with structured state spaces. ICLR","author":"Gu Albert","year":"2022","unstructured":"Albert Gu, Karan Goel, and Christopher R\u00e9. 2022. Efficiently modeling long sequences with structured state spaces. ICLR (2022)."},{"key":"e_1_2_1_91_1","unstructured":"Jindong Gu Zhen Han Shuo Chen Ahmad Beirami Bailan He Gengyuan Zhang Ruotong Liao Yao Qin Volker Tresp and Philip Torr. 2023. A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. arxiv:2307.12980 \u00a0[cs.CV]"},{"key":"e_1_2_1_92_1","unstructured":"Yuxian Gu Li Dong Furu Wei and Minlie Huang. 2024. MiniLLM: Knowledge Distillation of Large Language Models. In ICLR."},{"key":"e_1_2_1_93_1","first-page":"22982","article-title":"Diagonal state spaces are as effective as structured state spaces","volume":"35","author":"Gupta Ankit","year":"2022","unstructured":"Ankit Gupta, Albert Gu, and Jonathan Berant. 2022. Diagonal state spaces are as effective as structured state spaces. NeurIPS 35(2022), 22982\u201322994.","journal-title":"NeurIPS"},{"key":"e_1_2_1_94_1","doi-asserted-by":"crossref","unstructured":"Tanmay Gupta and Aniruddha Kembhavi. 2023. Visual programming: Compositional visual reasoning without training. In CVPR. 14953\u201314962.","DOI":"10.1109\/CVPR52729.2023.01436"},{"key":"e_1_2_1_95_1","volume-title":"EMNLP","author":"Hao Shibo","unstructured":"Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. 2023. Reasoning with Language Model is Planning with World Model. In EMNLP. ACL, Singapore."},{"key":"e_1_2_1_96_1","unstructured":"Shibo Hao Tianyang Liu Zhen Wang and Zhiting Hu. 2023. ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. In NeurIPS."},{"key":"e_1_2_1_97_1","volume-title":"Second NASA Formal Methods Symposium, Vol.\u00a0 8. 179\u2013195","author":"Harrison John","year":"2010","unstructured":"John Harrison. 2010. Formal methods at Intel\u2014An overview. In Second NASA Formal Methods Symposium, Vol.\u00a0 8. 179\u2013195."},{"key":"e_1_2_1_98_1","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. In ICLR."},{"key":"e_1_2_1_99_1","series-title":"Round 2","volume-title":"NeurIPS Datasets and Benchmarks Track","author":"Hendrycks Dan","unstructured":"Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring Mathematical Problem Solving With the MATH Dataset. In NeurIPS Datasets and Benchmarks Track (Round 2)."},{"key":"e_1_2_1_100_1","volume-title":"Measuring Mathematical Problem Solving With the MATH Dataset. NIPS","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring Mathematical Problem Solving With the MATH Dataset. NIPS (2021)."},{"key":"e_1_2_1_101_1","volume-title":"Large language models are reasoning teachers. ACL","author":"Ho Namgyu","year":"2023","unstructured":"Namgyu Ho, Laura Schmid, and Se-Young Yun. 2023. Large language models are reasoning teachers. ACL (2023), 14852\u201314882."},{"key":"e_1_2_1_102_1","unstructured":"Yining Hong Haoyu Zhen Peihao Chen Shuhong Zheng Yilun Du Zhenfang Chen and Chuang Gan. 2023. 3D-LLM: Injecting the 3D World into Large Language Models. In NeurIPS."},{"key":"e_1_2_1_103_1","volume-title":"et\u00a0al","author":"Hongjin SU","year":"2022","unstructured":"SU Hongjin, Jungo Kasai, Chen\u00a0Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah\u00a0A Smith, et\u00a0al. 2022. Selective Annotation Makes Language Models Better Few-Shot Learners. In ICLR."},{"key":"e_1_2_1_104_1","volume-title":"Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. In ACL. ACL, 14409\u201314428.","author":"Honovich Or","year":"2023","unstructured":"Or Honovich, Thomas Scialom, Omer Levy, and Timo Schick. 2023. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. In ACL. ACL, 14409\u201314428."},{"key":"e_1_2_1_105_1","volume-title":"ACL. Association for Computational Linguistics","author":"Hsieh Cheng-Yu","unstructured":"Cheng-Yu Hsieh, Chun-Liang Li, Chih-kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alex Ratner, Ranjay Krishna, Chen-Yu Lee, and Tomas Pfister. 2023. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. In ACL. Association for Computational Linguistics, Toronto, Canada, 8003\u20138017."},{"key":"e_1_2_1_106_1","doi-asserted-by":"crossref","first-page":"3451","DOI":"10.1109\/TASLP.2021.3122291","article-title":"HuBERT: Self-supervised speech representation learning by masked prediction of hidden units","volume":"29","author":"Hsu Wei-Ning","year":"2021","unstructured":"Wei-Ning Hsu, Benjamin Bolte, Yao-Hung\u00a0Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE\/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 3451\u20133460.","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"key":"e_1_2_1_107_1","unstructured":"Yen-Chang Hsu Ting Hua Sungen Chang Qian Lou Yilin Shen and Hongxia Jin. 2022. Language model compression with weighted low-rank factorization. In ICLR."},{"key":"e_1_2_1_108_1","unstructured":"Mengkang Hu Yao Mu Xinmiao\u00a0Chelsey Yu Mingyu Ding Shiguang Wu Wenqi Shao Qiguang Chen Bin Wang Yu Qiao and Ping Luo. 2024. Tree-Planner: Efficient Close-loop Task Planning with Large Language Models. In ICLR."},{"key":"e_1_2_1_109_1","doi-asserted-by":"crossref","unstructured":"Danqing Huang Shuming Shi Chin-Yew Lin Jian Yin and Wei-Ying Ma. 2016. How well do computers solve math word problems? large-scale dataset construction and evaluation. In ACL. 887\u2013896.","DOI":"10.18653\/v1\/P16-1084"},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1145\/3451179"},{"key":"e_1_2_1_111_1","volume-title":"Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting. EMNLP","author":"Huang Haoyang","year":"2023","unstructured":"Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne\u00a0Xin Zhao, Ting Song, Yan Xia, and Furu Wei. 2023. Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting. EMNLP (2023)."},{"key":"e_1_2_1_112_1","doi-asserted-by":"crossref","unstructured":"Jiaxin Huang Shixiang Gu Le Hou Yuexin Wu Xuezhi Wang Hongkun Yu and Jiawei Han. 2023. Large Language Models Can Self-Improve. In EMNLP. ACL 1051\u20131068.","DOI":"10.18653\/v1\/2023.emnlp-main.67"},{"key":"e_1_2_1_113_1","unstructured":"Shaohan Huang Li Dong Wenhui Wang Yaru Hao Saksham Singhal Shuming Ma Tengchao Lv Lei Cui Owais\u00a0Khan Mohammed Barun Patra Qiang Liu Kriti Aggarwal Zewen Chi Johan Bjorck Vishrav Chaudhary Subhojit Som Xia Song and Furu Wei. 2023. Language Is Not All You Need: Aligning Perception with Language Models. In NeurIPS."},{"key":"e_1_2_1_114_1","unstructured":"Wenlong Huang Pieter Abbeel Deepak Pathak and Igor Mordatch. 2022. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In ICML. PMLR 9118\u20139147."},{"key":"e_1_2_1_115_1","volume-title":"Gqa: A new dataset for real-world visual reasoning and compositional question answering. In CVPR. 6700\u20136709.","author":"Hudson A","year":"2019","unstructured":"Drew\u00a0A Hudson and Christopher\u00a0D Manning. 2019. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In CVPR. 6700\u20136709."},{"key":"e_1_2_1_116_1","doi-asserted-by":"crossref","unstructured":"Tatsuro Inaba Hirokazu Kiyomaru Fei Cheng and Sadao Kurohashi. 2023. MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting. In ACL. ACL 1522\u20131532.","DOI":"10.18653\/v1\/2023.acl-short.130"},{"key":"e_1_2_1_117_1","volume-title":"et\u00a0al","author":"Iyer Srinivasan","year":"2022","unstructured":"Srinivasan Iyer, Xi\u00a0Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, Ping Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit\u00a0Singh Koura, et\u00a0al. 2022. Opt-iml: Scaling language model instruction meta learning through the lens of generalization. arXiv preprint arXiv:2212.12017(2022)."},{"key":"e_1_2_1_118_1","volume-title":"Adaptive mixtures of local experts. Neural computation 3, 1","author":"Jacobs A","year":"1991","unstructured":"Robert\u00a0A Jacobs, Michael\u00a0I Jordan, Steven\u00a0J Nowlan, and Geoffrey\u00a0E Hinton. 1991. Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79\u201387."},{"key":"e_1_2_1_119_1","volume-title":"Spatial Transformer Networks. NeurIPS","author":"Jaderberg Max","year":"2016","unstructured":"Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2016. Spatial Transformer Networks. NeurIPS (2016)."},{"key":"e_1_2_1_120_1","volume-title":"6th Conference on Artificial Intelligence and Theorem Proving. 378\u2013392","author":"Jiang Albert\u00a0Qiaochu","year":"2021","unstructured":"Albert\u00a0Qiaochu Jiang, Wenda Li, Jesse\u00a0Michael Han, and Yuhuai Wu. 2021. LISA: Language models of ISAbelle proofs. In 6th Conference on Artificial Intelligence and Theorem Proving. 378\u2013392."},{"key":"e_1_2_1_121_1","volume-title":"Emma\u00a0Bou Hanna, Florian Bressand, et\u00a0al.","author":"Jiang Q","year":"2024","unstructured":"Albert\u00a0Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra\u00a0Singh Chaplot, Diego de\u00a0las Casas, Emma\u00a0Bou Hanna, Florian Bressand, et\u00a0al. 2024. Mixtral of experts. arXiv preprint arXiv:2401.04088(2024)."},{"key":"e_1_2_1_122_1","unstructured":"Albert\u00a0Qiaochu Jiang Sean Welleck Jin\u00a0Peng Zhou Timothee Lacroix Jiacheng Liu Wenda Li Mateja Jamnik Guillaume Lample and Yuhuai Wu. 2023. Draft Sketch and Prove: Guiding Formal Theorem Provers with Informal Proofs. In ICLR."},{"key":"e_1_2_1_123_1","doi-asserted-by":"crossref","unstructured":"Dongwei Jiang Wubo Li Miao Cao Wei Zou and Xiangang Li. 2020. Speech SIMCLR: Combining contrastive and reconstruction objective for self-supervised speech representation learning. In INTERSPEECH. 1544\u20131548.","DOI":"10.21437\/Interspeech.2021-391"},{"key":"e_1_2_1_124_1","doi-asserted-by":"crossref","unstructured":"Ran Jiao Zhaowei Wang Ruihang Chu Mingjie Dong Yongfeng Rong and Wusheng Chou. 2020. An intuitive end-to-end human-UAV interaction system for field exploration. Frontiers in Neurorobotics(2020).","DOI":"10.3389\/fnbot.2019.00117"},{"key":"e_1_2_1_125_1","unstructured":"Mehran Kazemi Quan Yuan Deepti Bhatia Najoung Kim Xin Xu Vaiva Imbrasaite and Deepak Ramachandran. 2023. BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information. In NeurIPS Datasets and Benchmarks Track."},{"key":"e_1_2_1_126_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2020\/7346763","article-title":"Formal verification of hardware components in critical systems","volume":"2020","author":"Khan Wilayat","year":"2020","unstructured":"Wilayat Khan, Muhammad Kamran, Syed\u00a0Rameez Naqvi, Farrukh\u00a0Aslam Khan, Ahmed\u00a0S Alghamdi, and Eesa Alsolami. 2020. Formal verification of hardware components in critical systems. Wireless Communications and Mobile Computing 2020 (2020), 1\u201315.","journal-title":"Wireless Communications and Mobile Computing"},{"key":"e_1_2_1_127_1","volume-title":"EMNLP 2020","author":"Khashabi Daniel","year":"2020","unstructured":"Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. 2020. UNIFIEDQA: Crossing Format Boundaries with a Single QA System. In EMNLP 2020. ACL, Online, 1896\u20131907."},{"key":"e_1_2_1_128_1","unstructured":"Seungone Kim Se\u00a0June Joo Doyoung Kim Joel Jang Seonghyeon Ye Jamin Shin and Minjoon Seo. 2023. The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning. In EMNLP."},{"key":"e_1_2_1_129_1","first-page":"22199","article-title":"Large language models are zero-shot reasoners","volume":"35","author":"Kojima Takeshi","year":"2022","unstructured":"Takeshi Kojima, Shixiang\u00a0Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. NeurIPS 35(2022), 22199\u201322213.","journal-title":"NeurIPS"},{"key":"e_1_2_1_130_1","unstructured":"Michal Kosinski. 2023. Theory of Mind May Have Spontaneously Emerged in Large Language Models. arxiv:2302.02083 \u00a0[cs.CL]"},{"key":"e_1_2_1_131_1","volume-title":"ACL","author":"Kushman Nate","unstructured":"Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay. 2014. Learning to Automatically Solve Algebra Word Problems. In ACL. ACL, Baltimore, Maryland, 271\u2013281."},{"key":"e_1_2_1_132_1","unstructured":"Abdullatif K\u00f6ksal Timo Schick Anna Korhonen and Hinrich Sch\u00fctze. 2023. LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction. arxiv:2304.08460 \u00a0[cs.CL]"},{"key":"e_1_2_1_133_1","unstructured":"Andreas K\u00f6pf Yannic Kilcher Dimitri von R\u00fctte Sotiris Anagnostidis Zhi-Rui Tam Keith Stevens Abdullah Barhoum Nguyen\u00a0Minh Duc Oliver Stanley Rich\u00e1rd Nagyfi Shahul ES Sameer Suri David Glushkov Arnav Dantuluri Andrew Maguire Christoph Schuhmann Huu Nguyen and Alexander Mattick. 2023. OpenAssistant Conversations \u2013 Democratizing Large Language Model Alignment. NeurIPS Datasets and Benchmark Track(2023)."},{"key":"e_1_2_1_134_1","doi-asserted-by":"crossref","unstructured":"Shibamouli Lahiri. 2014. Complexity of Word Collocation Networks: A Preliminary Structural Analysis. In ACL. ACL Gothenburg Sweden 96\u2013105.","DOI":"10.3115\/v1\/E14-3011"},{"key":"e_1_2_1_135_1","volume-title":"Teven Le\u00a0Scao, Leandro Von\u00a0Werra, Chenghao Mou, Eduardo Gonz\u00e1lez\u00a0Ponferrada, Huu Nguyen, et\u00a0al.","author":"Lauren\u00e7on Hugo","year":"2022","unstructured":"Hugo Lauren\u00e7on, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova\u00a0del Moral, Teven Le\u00a0Scao, Leandro Von\u00a0Werra, Chenghao Mou, Eduardo Gonz\u00e1lez\u00a0Ponferrada, Huu Nguyen, et\u00a0al. 2022. The bigscience roots corpus: A 1.6 tb composite multilingual dataset. NeurIPS 35(2022), 31809\u201331826."},{"key":"e_1_2_1_136_1","first-page":"4843","article-title":"Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis","volume":"35","author":"Laurent Jonathan","year":"2022","unstructured":"Jonathan Laurent and Andr\u00e9 Platzer. 2022. Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis. NeurIPS 35(2022), 4843\u20134856.","journal-title":"NeurIPS"},{"key":"e_1_2_1_137_1","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics. 669\u2013683","author":"Lee Young-Jun","year":"2022","unstructured":"Young-Jun Lee, Chae-Gyun Lim, and Ho-Jin Choi. 2022. Does GPT-3 generate empathetic dialogues? A novel in-context example selection method and automatic evaluation metric for empathetic dialogue generation. In Proceedings of the 29th International Conference on Computational Linguistics. 669\u2013683."},{"key":"e_1_2_1_138_1","unstructured":"Dmitry Lepikhin HyoukJoong Lee Yuanzhong Xu Dehao Chen Orhan Firat Yanping Huang Maxim Krikun Noam Shazeer and Zhifeng Chen. 2021. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. In ICLR."},{"key":"e_1_2_1_139_1","volume-title":"Diverse demonstrations improve in-context compositional generalization. ACL","author":"Levy Itay","year":"2023","unstructured":"Itay Levy, Ben Bogin, and Jonathan Berant. 2023. Diverse demonstrations improve in-context compositional generalization. ACL (2023)."},{"key":"e_1_2_1_140_1","volume-title":"Large Multimodal Models: Notes on CVPR 2023 Tutorial. arxiv:2306","author":"Li Chunyuan","year":"2023","unstructured":"Chunyuan Li. 2023. Large Multimodal Models: Notes on CVPR 2023 Tutorial. arxiv:2306.14895 \u00a0[cs.CV]"},{"key":"e_1_2_1_141_1","volume-title":"Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem.","author":"Li Guohao","year":"2023","unstructured":"Guohao Li, Hasan Abed Al\u00a0Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. CAMEL: Communicative agents for\u201d mind\u201d exploration of large language model society. In NeurIPS."},{"key":"e_1_2_1_142_1","volume-title":"ICML. PMLR","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML. PMLR, 19730\u201319742."},{"key":"e_1_2_1_143_1","volume-title":"Counterfactual reasoning: Testing language models","author":"Li Jiaxuan","year":"2023","unstructured":"Jiaxuan Li, Lang Yu, and Allyson Ettinger. 2023. Counterfactual reasoning: Testing language models\u2019 understanding of hypothetical scenarios. ACL (2023)."},{"key":"e_1_2_1_144_1","volume-title":"et\u00a0al","author":"Li Shiyang","year":"2024","unstructured":"Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, et\u00a0al. 2024. Explanations from large language models make small reasoners better. AAAI (2024)."},{"key":"e_1_2_1_145_1","volume-title":"Vision-Language Foundation Models as Effective Robot Imitators. ICLR","author":"Li Xinghang","year":"2024","unstructured":"Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, and Tao Kong. 2024. Vision-Language Foundation Models as Effective Robot Imitators. ICLR (2024)."},{"key":"e_1_2_1_146_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_147_1","volume-title":"Finding supporting examples for in-context learning. EMNLP","author":"Li Xiaonan","year":"2023","unstructured":"Xiaonan Li and Xipeng Qiu. 2023. Finding supporting examples for in-context learning. EMNLP (2023)."},{"key":"e_1_2_1_148_1","volume-title":"Self-Alignment with Instruction Backtranslation. ICLR","author":"Li Xian","year":"2024","unstructured":"Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Luke Zettlemoyer, Omer Levy, Jason Weston, and Mike Lewis. 2024. Self-Alignment with Instruction Backtranslation. ICLR (2024)."},{"key":"e_1_2_1_149_1","unstructured":"Xiang\u00a0Lorraine Li Adhiguna Kuncoro Jordan Hoffmann Cyprien de Masson\u00a0d\u2019Autume Phil Blunsom and Aida Nematzadeh. 2022. A systematic investigation of commonsense knowledge in large language models. In EMNLP. 11838\u201311855."},{"key":"e_1_2_1_150_1","unstructured":"Yifan Li Yifan Du Kun Zhou Jinpeng Wang Xin Zhao and Ji-Rong Wen. 2023. Evaluating Object Hallucination in Large Vision-Language Models. In EMNLP. ACL 292\u2013305."},{"key":"e_1_2_1_151_1","unstructured":"Yiming Li Tao Kong Ruihang Chu Yifeng Li Peng Wang and Lei Li. 2021. Simultaneous semantic and collision learning for 6-dof grasp pose estimation. In 2021 IEEE. In IROS."},{"key":"e_1_2_1_152_1","unstructured":"Yifei Li Zeqi Lin Shizhuo Zhang Qiang Fu Bei Chen Jian-Guang Lou and Weizhu Chen. 2022. On the advance of making language models better reasoners. arXiv preprint arXiv:2206.02336(2022)."},{"key":"e_1_2_1_153_1","unstructured":"Yanwei Li Chengyao Wang and Jiaya Jia. 2024. LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models. ECCV."},{"key":"e_1_2_1_154_1","volume-title":"Code as policies: Language model programs for embodied control","author":"Liang Jacky","unstructured":"Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as policies: Language model programs for embodied control. In ICRA. IEEE, 9493\u20139500."},{"key":"e_1_2_1_155_1","unstructured":"Bill\u00a0Yuchen Lin Yicheng Fu Karina Yang Faeze Brahman Shiyu Huang Chandra Bhagavatula Prithviraj Ammanabrolu Yejin Choi and Xiang Ren. 2023. SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. In NeurIPS."},{"key":"e_1_2_1_156_1","unstructured":"Zhixuan Lin Yi-Fu Wu Skand Peri Bofeng Fu Jindong Jiang and Sungjin Ahn. 2020. Improving generative imagination in object-centric world models. In ICML. JMLR.org Article 570 10\u00a0pages."},{"key":"e_1_2_1_157_1","volume-title":"Association for Computational Linguistics","author":"Ling Wang","unstructured":"Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blunsom. 2017. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. In ACL, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, Canada, 158\u2013167."},{"key":"e_1_2_1_158_1","unstructured":"Fangyu Liu Julian\u00a0Martin Eisenschlos Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Wenhu Chen Nigel Collier and Yasemin Altun. 2023. DePlot: One-shot visual language reasoning by plot-to-table translation. In ACL. https:\/\/arxiv.org\/abs\/2212.10505"},{"key":"e_1_2_1_159_1","unstructured":"Fangyu Liu Francesco Piccinno Syrine Krichene Chenxi Pang Kenton Lee Mandar Joshi Yasemin Altun Nigel Collier and Julian\u00a0Martin Eisenschlos. 2023. MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering. In ACL. https:\/\/arxiv.org\/abs\/2212.09662"},{"key":"e_1_2_1_160_1","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong\u00a0Jae Lee. 2023. Visual Instruction Tuning. In NeurIPS."},{"key":"e_1_2_1_161_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.deelio-1.10"},{"key":"e_1_2_1_162_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00638"},{"key":"e_1_2_1_163_1","volume-title":"RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees","author":"Liu Tengxiao","year":"2022","unstructured":"Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Yue Zhang, Xipeng Qiu, and Zheng Zhang. 2022. RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees. In EMNLP, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 7177\u20137189."},{"key":"e_1_2_1_164_1","doi-asserted-by":"crossref","unstructured":"Tengxiao Liu Qipeng Guo Yuqing Yang Xiangkun Hu Yue Zhang Xipeng Qiu and Zheng Zhang. 2023. Plan Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts. In EMNLP. ACL 2807\u20132822.","DOI":"10.18653\/v1\/2023.emnlp-main.169"},{"key":"e_1_2_1_165_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_166_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019).","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019)."},{"key":"e_1_2_1_167_1","unstructured":"Jieyi Long. 2023. Large Language Model Guided Tree-of-Thought. arxiv:2305.08291 \u00a0[cs.AI]"},{"key":"e_1_2_1_168_1","volume-title":"et\u00a0al","author":"Longpre Shayne","year":"2023","unstructured":"Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung\u00a0Won Chung, Yi Tay, Denny Zhou, Quoc\u00a0V Le, Barret Zoph, Jason Wei, et\u00a0al. 2023. The flan collection: Designing data and methods for effective instruction tuning. ICML (2023)."},{"key":"e_1_2_1_169_1","volume-title":"Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning. ACL","author":"Lu Pan","year":"2021","unstructured":"Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, and Song-Chun Zhu. 2021. Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning. ACL (2021)."},{"key":"e_1_2_1_170_1","unstructured":"Pan Lu Swaroop Mishra Tony Xia Liang Qiu Kai-Wei Chang Song-Chun Zhu Oyvind Tafjord Peter Clark and Ashwin Kalyan. 2022. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. In NeurIPS."},{"key":"e_1_2_1_171_1","volume-title":"Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models. In NeurIPS.","author":"Lu Pan","year":"2023","unstructured":"Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying\u00a0Nian Wu, Song-Chun Zhu, and Jianfeng Gao. 2023. Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models. In NeurIPS."},{"key":"e_1_2_1_172_1","unstructured":"Pan Lu Liang Qiu Kai-Wei Chang Ying\u00a0Nian Wu Song-Chun Zhu Tanmay Rajpurohit Peter Clark and Ashwin Kalyan. 2023. Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning. In ICLR."},{"key":"e_1_2_1_173_1","volume-title":"Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583(2023).","author":"Luo Haipeng","year":"2023","unstructured":"Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dongmei Zhang. 2023. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583(2023)."},{"key":"e_1_2_1_174_1","volume-title":"Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, and Chitta Baral.","author":"Luo Man","year":"2023","unstructured":"Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, and Chitta Baral. 2023. Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models. arxiv:2310.00836 \u00a0[cs.CL]"},{"key":"e_1_2_1_175_1","unstructured":"Man Luo Xin Xu Zhuyun Dai Panupong Pasupat Mehran Kazemi Chitta Baral Vaiva Imbrasaite and Vincent\u00a0Y Zhao. 2023. Dr. ICL: Demonstration-Retrieved In-context Learning. arXiv preprint arXiv:2305.14128(2023)."},{"key":"e_1_2_1_176_1","doi-asserted-by":"publisher","unstructured":"Aman Madaan Shuyan Zhou Uri Alon Yiming Yang and Graham Neubig. 2022. Language Models of Code are Few-Shot Commonsense Learners. In EMNLP Yoav Goldberg Zornitsa Kozareva and Yue Zhang (Eds.). Association for Computational Linguistics Abu Dhabi United Arab Emirates 1384\u20131403. https:\/\/doi.org\/10.18653\/v1\/2022.emnlp-main.90","DOI":"10.18653\/v1"},{"key":"e_1_2_1_177_1","volume-title":"et\u00a0al","author":"Madani Ali","year":"2023","unstructured":"Ali Madani, Ben Krause, Eric\u00a0R Greene, Subu Subramanian, Benjamin\u00a0P Mohr, James\u00a0M Holton, Jose\u00a0Luis Olmos\u00a0Jr, Caiming Xiong, Zachary\u00a0Z Sun, Richard Socher, et\u00a0al. 2023. Large language models generate functional protein sequences across diverse families. Nature Biotechnology(2023), 1\u20138."},{"key":"e_1_2_1_178_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_179_1","volume-title":"Giorgio Giannone, Samuel\u00a0C Hoffman, Matthew Buchan, et\u00a0al.","author":"Manica Matteo","year":"2023","unstructured":"Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan\u00a0Nana Teukam, Giorgio Giannone, Samuel\u00a0C Hoffman, Matthew Buchan, et\u00a0al. 2023. Accelerating material design with the generative toolkit for scientific discovery. npj Computational Materials 9, 1 (2023), 69."},{"key":"e_1_2_1_180_1","doi-asserted-by":"publisher","DOI":"10.1162\/daed_a_01905"},{"key":"e_1_2_1_181_1","unstructured":"Jiayuan Mao Chuang Gan Pushmeet Kohli Joshua\u00a0B Tenenbaum and Jiajun Wu. 2019. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences From Natural Supervision. In ICLR."},{"key":"e_1_2_1_182_1","first-page":"7755","article-title":"CLEVRER-Humans: Describing Physical and Causal Events the Human Way","volume":"35","author":"Mao Jiayuan","year":"2022","unstructured":"Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah Goodman, and Jiajun Wu. 2022. CLEVRER-Humans: Describing Physical and Causal Events the Human Way. NeurIPS 35(2022), 7755\u20137768.","journal-title":"NeurIPS"},{"key":"e_1_2_1_183_1","unstructured":"Jiageng Mao Junjie Ye Yuxi Qian Marco Pavone and Yue Wang. 2024. A Language Agent for Autonomous Driving. In CoLM."},{"key":"e_1_2_1_184_1","volume-title":"A Computer Language for Mathematical Proofs. arXiv","author":"Megill Norman","year":"2019","unstructured":"Norman Megill and David\u00a0A Wheeler. 2019. A Computer Language for Mathematical Proofs. arXiv (2019)."},{"key":"e_1_2_1_185_1","unstructured":"Harsh Mehta Ankit Gupta Ashok Cutkosky and Behnam Neyshabur. 2023. Long Range Language Modeling via Gated State Spaces. In ICLR."},{"key":"e_1_2_1_186_1","volume-title":"Magnushammer: A Transformer-Based Approach to Premise Selection. In ICLR.","author":"Miku\u0142a Maciej","year":"2024","unstructured":"Maciej Miku\u0142a, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert\u00a0Q. Jiang, Jin\u00a0Peng Zhou, Christian Szegedy, \u0141ukasz Kuci\u0144ski, Piotr Mi\u0142o\u015b, and Yuhuai Wu. 2024. Magnushammer: A Transformer-Based Approach to Premise Selection. In ICLR."},{"key":"e_1_2_1_187_1","doi-asserted-by":"crossref","unstructured":"Sewon Min Mike Lewis Luke Zettlemoyer and Hannaneh Hajishirzi. 2022. MetaICL: Learning to Learn In Context. In NAACL-HLT.","DOI":"10.18653\/v1\/2022.naacl-main.201"},{"key":"e_1_2_1_188_1","doi-asserted-by":"crossref","unstructured":"Swaroop Mishra Daniel Khashabi Chitta Baral and Hannaneh Hajishirzi. 2022. Cross-task generalization via natural language crowdsourcing instructions. In ACL.","DOI":"10.18653\/v1\/2022.acl-long.244"},{"key":"e_1_2_1_189_1","volume-title":"et\u00a0al","author":"Mitra Arindam","year":"2024","unstructured":"Arindam Mitra, Luciano Del\u00a0Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agrawal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, et\u00a0al. 2024. Orca 2: Teaching Small Language Models How to Reason. ACL (2024)."},{"key":"e_1_2_1_190_1","volume-title":"Proceedings 28","author":"Moura Leonardo\u00a0de","year":"2021","unstructured":"Leonardo\u00a0de Moura and Sebastian Ullrich. 2021. The lean 4 theorem prover and programming language. In Automated Deduction\u2013CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12\u201315, 2021, Proceedings 28. Springer, 625\u2013635."},{"key":"e_1_2_1_191_1","unstructured":"Yao Mu Qinglong Zhang Mengkang Hu Wenhai Wang Mingyu Ding Jun Jin Bin Wang Jifeng Dai Yu Qiao and Ping Luo. 2023. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. In NeurIPS."},{"key":"e_1_2_1_192_1","volume-title":"et\u00a0al","author":"Muennighoff Niklas","year":"2023","unstructured":"Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven\u00a0Le Scao, M\u00a0Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, et\u00a0al. 2023. Crosslingual generalization through multitask finetuning. ACL (2023)."},{"key":"e_1_2_1_193_1","volume-title":"Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707(2023).","author":"Mukherjee Subhabrata","year":"2023","unstructured":"Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, and Ahmed Awadallah. 2023. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707(2023)."},{"key":"e_1_2_1_194_1","unstructured":"Niels M\u00fcndler Jingxuan He Slobodan Jenko and Martin Vechev. 2024. Self-contradictory Hallucinations of Large Language Models: Evaluation Detection and Mitigation. In ICLR."},{"key":"e_1_2_1_195_1","doi-asserted-by":"crossref","unstructured":"Nathalia Nascimento Paulo Alencar and Donald Cowan. 2023. Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems. In ACSOS-C. 104\u2013109.","DOI":"10.1109\/ACSOS-C58168.2023.00048"},{"key":"e_1_2_1_196_1","first-page":"21455","article-title":"Quality not quantity: On the interaction between dataset design and robustness of clip","volume":"35","author":"Nguyen Thao","year":"2022","unstructured":"Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, and Ludwig Schmidt. 2022. Quality not quantity: On the interaction between dataset design and robustness of clip. NeurIPS 35(2022), 21455\u201321469.","journal-title":"NeurIPS"},{"key":"e_1_2_1_197_1","unstructured":"Xuefei Ning Zinan Lin Zixuan Zhou Zifu Wang Huazhong Yang and Yu Wang. 2024. Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation. In ICLR."},{"key":"e_1_2_1_199_1","unstructured":"Vicente Ordonez Girish Kulkarni and Tamara Berg. 2011. Im2text: Describing images using 1 million captioned photographs. NeurIPS 24(2011)."},{"key":"e_1_2_1_200_1","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et\u00a0al. 2022. Training language models to follow instructions with human feedback. NeurIPS 35(2022), 27730\u201327744.","journal-title":"NeurIPS"},{"key":"e_1_2_1_201_1","doi-asserted-by":"crossref","unstructured":"Siqi Ouyang and Lei Li. 2023. AutoPlan: Automatic Planning of Interactive Decision-Making Tasks With Large Language Models. In EMNLP. ACL 3114\u20133128.","DOI":"10.18653\/v1\/2023.findings-emnlp.205"},{"key":"e_1_2_1_202_1","volume-title":"et\u00a0al","author":"O\u2019Neill Abby","year":"2024","unstructured":"Abby O\u2019Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et\u00a0al. 2024. Open x-embodiment: Robotic learning datasets and rt-x models. In ICRA. IEEE, 6892\u20136903."},{"key":"e_1_2_1_203_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3004325"},{"key":"e_1_2_1_204_1","unstructured":"Liangming Pan Alon Albalak Xinyi Wang and William Wang. 2023. Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning. In EMNLP. ACL 3806\u20133824."},{"key":"e_1_2_1_205_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0030541"},{"key":"e_1_2_1_206_1","volume-title":"Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan Wind, Stanis\u0142aw Wo\u017aniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, and Rui-Jie Zhu.","author":"Peng Bo","year":"2023","unstructured":"Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Leon Derczynski, Xingjian Du, Matteo Grella, Kranthi Gv, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bart\u0142omiej Koptyra, Hayden Lau, Jiaju Lin, Krishna Sri\u00a0Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan Wind, Stanis\u0142aw Wo\u017aniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, and Rui-Jie Zhu. 2023. RWKV: Reinventing RNNs for the Transformer Era. In EMNLP. ACL, Singapore, 14048\u201314077."},{"key":"e_1_2_1_207_1","unstructured":"Baolin Peng Chunyuan Li Pengcheng He Michel Galley and Jianfeng Gao. 2023. Instruction Tuning with GPT-4. arXiv preprint arXiv:2304.03277(2023)."},{"key":"e_1_2_1_208_1","unstructured":"Zhiliang Peng Wenhui Wang Li Dong Yaru Hao Shaohan Huang Shuming Ma Qixiang Ye and Furu Wei. 2024. Grounding Multimodal Large Language Models to the World. In ICLR."},{"key":"e_1_2_1_209_1","doi-asserted-by":"crossref","unstructured":"Renjie Pi Jiahui Gao Shizhe Diao Rui Pan Hanze Dong Jipeng Zhang Lewei Yao Jianhua Han Hang Xu Lingpeng Kong and Tong Zhang. 2023. DetGPT: Detect What You Need via Reasoning. In EMNLP. ACL 14172\u201314189.","DOI":"10.18653\/v1\/2023.emnlp-main.876"},{"key":"e_1_2_1_210_1","volume-title":"Hyena hierarchy: Towards larger convolutional language models. ICML","author":"Poli Michael","year":"2023","unstructured":"Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel\u00a0Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher R\u00e9. 2023. Hyena hierarchy: Towards larger convolutional language models. ICML (2023)."},{"key":"e_1_2_1_211_1","unstructured":"Stanislas Polu Jesse\u00a0Michael Han Kunhao Zheng Mantas Baksys Igor Babuschkin and Ilya Sutskever. 2023. Formal Mathematics Statement Curriculum Learning. In ICLR."},{"key":"e_1_2_1_212_1","volume-title":"Measuring and Narrowing the Compositionality Gap in Language Models. EMNLP","author":"Press Ofir","year":"2023","unstructured":"Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah\u00a0A. Smith, and Mike Lewis. 2023. Measuring and Narrowing the Compositionality Gap in Language Models. EMNLP (2023)."},{"key":"e_1_2_1_213_1","doi-asserted-by":"crossref","unstructured":"Connor Pryor Charles Dickens Eriq Augustine Alon Albalak William Wang and Lise Getoor. 2023. NeuPSL: Neural Probabilistic Soft Logic. In IJCAI. 4145\u20134153.","DOI":"10.24963\/ijcai.2023\/461"},{"key":"e_1_2_1_214_1","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1007\/s13218-020-00685-4","article-title":"The AAA ABox Abduction Solver: System Description","volume":"34","author":"Pukancov\u00e1 J\u00falia","year":"2020","unstructured":"J\u00falia Pukancov\u00e1 and Martin Homola. 2020. The AAA ABox Abduction Solver: System Description. KI-K\u00fcnstliche Intelligenz 34, 4 (2020), 517\u2013522.","journal-title":"KI-K\u00fcnstliche Intelligenz"},{"key":"e_1_2_1_215_1","volume-title":"CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models. In EMNLP. ACL, 6922\u20136939.","author":"Qian Cheng","year":"2023","unstructured":"Cheng Qian, Chi Han, Yi Fung, Yujia Qin, Zhiyuan Liu, and Heng Ji. 2023. CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models. In EMNLP. ACL, 6922\u20136939."},{"key":"e_1_2_1_216_1","doi-asserted-by":"crossref","unstructured":"Shuofei Qiao Honghao Gui Chengfei Lv Qianghuai Jia Huajun Chen and Ningyu Zhang. 2024. Making Language Models Better Tool Learners with Execution Feedback. In NAACL. ACL 3550\u20133568.","DOI":"10.18653\/v1\/2024.naacl-long.195"},{"key":"e_1_2_1_217_1","volume-title":"ACL","author":"Qiao Shuofei","unstructured":"Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. 2023. Reasoning with Language Model Prompting: A Survey. In ACL. ACL, Toronto, Canada, 5368\u20135393."},{"key":"e_1_2_1_218_1","volume-title":"Tool Learning with Foundation Models. ACM Comput. Surv. (Nov","author":"Qin Yujia","year":"2024","unstructured":"Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Xuanhe Zhou, Yufei Huang, Chaojun Xiao, Chi Han, Yi\u00a0Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Guoliang Li, Zhiyuan Liu, and Maosong Sun. 2024. Tool Learning with Foundation Models. ACM Comput. Surv. (Nov. 2024)."},{"key":"e_1_2_1_219_1","unstructured":"Jianing Qiu Kyle Lam Guohao Li Amish Acharya Tien\u00a0Yin Wong Ara Darzi Wu Yuan and Eric\u00a0J Topol. 2024. LLM-based agentic systems in medicine and healthcare. Nature Machine Intelligence(2024) 1\u20133."},{"key":"e_1_2_1_220_1","volume-title":"et\u00a0al","author":"Qiu Jianing","year":"2024","unstructured":"Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, et\u00a0al. 2024. Development and validation of a multimodal multitask vision foundation model for generalist ophthalmic artificial intelligence. NEJM AI 1, 12 (2024), AIoa2300221."},{"key":"e_1_2_1_221_1","volume-title":"The application of multimodal large language models in medicine. The Lancet Regional Health\u2013Western Pacific 45","author":"Qiu Jianing","year":"2024","unstructured":"Jianing Qiu, Wu Yuan, and Kyle Lam. 2024. The application of multimodal large language models in medicine. The Lancet Regional Health\u2013Western Pacific 45 (2024)."},{"key":"e_1_2_1_222_1","volume-title":"et\u00a0al","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In ICML. PMLR, 8748\u20138763."},{"key":"e_1_2_1_223_1","volume-title":"et\u00a0al","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et\u00a0al. 2018. Improving language understanding by generative pre-training. arXiv (2018)."},{"key":"e_1_2_1_224_1","volume-title":"et\u00a0al","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9."},{"key":"e_1_2_1_225_1","volume-title":"et\u00a0al","author":"Rae W","year":"2021","unstructured":"Jack\u00a0W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et\u00a0al. 2021. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446(2021)."},{"key":"e_1_2_1_226_1","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Christopher\u00a0D Manning Stefano Ermon and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. In NeurIPS."},{"key":"e_1_2_1_227_1","unstructured":"Colin Raffel Noam Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li and Peter\u00a0J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. (2020)."},{"key":"e_1_2_1_228_1","doi-asserted-by":"crossref","unstructured":"Vipul Raheja Dhruv Kumar Ryan Koo and Dongyeop Kang. 2023. CoEdIT: Text Editing by Task-Specific Instruction Tuning. In EMNLP.","DOI":"10.18653\/v1\/2023.findings-emnlp.350"},{"key":"e_1_2_1_229_1","unstructured":"Nazneen\u00a0Fatema Rajani Bryan McCann Caiming Xiong and Richard Socher. 2019. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. In ACL. ACL 4932\u20134942."},{"key":"e_1_2_1_230_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_231_1","unstructured":"IBM. Redbooks. 2004. Practical Guide to the IBM Autonomic Computing Toolkit. IBM."},{"key":"e_1_2_1_232_1","volume-title":"et\u00a0al","author":"Reed Scott","year":"2022","unstructured":"Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio\u00a0Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost\u00a0Tobias Springenberg, et\u00a0al. 2022. A generalist agent. TMLR (2022)."},{"key":"e_1_2_1_233_1","unstructured":"Siyu Ren and Kenny\u00a0Q. Zhu. 2024. Low-Rank Prune-And-Factorize for Language Model Compression. Torino Italia 10822\u201310832."},{"key":"e_1_2_1_234_1","unstructured":"Tal Ridnik Emanuel Ben-Baruch Asaf Noy and Lihi Zelnik-Manor. 2021. ImageNet-21K Pretraining for the Masses. In NeurIPS Datasets and Benchmarks Track."},{"key":"e_1_2_1_235_1","doi-asserted-by":"crossref","unstructured":"Stephen Roller Emily Dinan Naman Goyal Da Ju Mary Williamson Yinhan Liu Jing Xu Myle Ott Eric\u00a0Michael Smith Y-Lan Boureau and Jason Weston. 2021. Recipes for Building an Open-Domain Chatbot. ACL 300\u2013325.","DOI":"10.18653\/v1\/2021.eacl-main.24"},{"key":"e_1_2_1_236_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR. 10684\u201310695.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_237_1","doi-asserted-by":"crossref","unstructured":"Ohad Rubin Jonathan Herzig and Jonathan Berant. 2022. Learning To Retrieve Prompts for In-Context Learning. In ACL. 2655\u20132671.","DOI":"10.18653\/v1\/2022.naacl-main.191"},{"key":"e_1_2_1_238_1","first-page":"36479","article-title":"Photorealistic text-to-image diffusion models with deep language understanding","volume":"35","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily\u00a0L Denton, Kamyar Ghasemipour, Raphael Gontijo\u00a0Lopes, Burcu Karagol\u00a0Ayan, Tim Salimans, et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS 35(2022), 36479\u201336494.","journal-title":"NeurIPS"},{"key":"e_1_2_1_239_1","unstructured":"Victor Sanh Albert Webson Colin Raffel Stephen Bach Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Arun Raja Manan Dey M\u00a0Saiful Bari Canwen Xu Urmish Thakker Shanya\u00a0Sharma Sharma Eliza Szczechla Taewoon Kim Gunjan Chhablani Nihal Nayak Debajyoti Datta Jonathan Chang Mike Tian-Jian Jiang Han Wang Matteo Manica Sheng Shen Zheng\u00a0Xin Yong Harshit Pandey Rachel Bawden Thomas Wang Trishala Neeraj Jos Rozen Abheesht Sharma Andrea Santilli Thibault Fevry Jason\u00a0Alan Fries Ryan Teehan Teven\u00a0Le Scao Stella Biderman Leo Gao Thomas Wolf and Alexander\u00a0M Rush. 2022. Multitask Prompted Training Enables Zero-Shot Task Generalization. In ICLR."},{"key":"e_1_2_1_240_1","unstructured":"Maarten Sap Hannah Rashkin Derek Chen Ronan Le\u00a0Bras and Yejin Choi. 2019. Social IQa: Commonsense Reasoning about Social Interactions. In EMNLP-IJCNLP. ACL Hong Kong China 4463\u20134473."},{"key":"e_1_2_1_241_1","unstructured":"Abulhair Saparov and He He. 2023. Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought. In ICLR."},{"key":"e_1_2_1_242_1","volume-title":"et\u00a0al","author":"Scao Teven\u00a0Le","year":"2022","unstructured":"Teven\u00a0Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ili\u0107, Daniel Hesslow, Roman Castagn\u00e9, Alexandra\u00a0Sasha Luccioni, Fran\u00e7ois Yvon, Matthias Gall\u00e9, et\u00a0al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100(2022)."},{"key":"e_1_2_1_243_1","volume-title":"Toolformer: Language Models Can Teach Themselves to Use Tools. In NeurIPS.","author":"Schick Timo","year":"2023","unstructured":"Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. In NeurIPS."},{"key":"e_1_2_1_244_1","first-page":"25278","article-title":"Laion-5b: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et\u00a0al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS 35(2022), 25278\u201325294.","journal-title":"NeurIPS"},{"key":"e_1_2_1_245_1","unstructured":"Christoph Schuhmann Richard Vencu Romain Beaumont Robert Kaczmarczyk Clayton Mullis Aarush Katta Theo Coombes Jenia Jitsev and Aran Komatsuzaki. 2021. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114(2021)."},{"key":"e_1_2_1_246_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017)."},{"key":"e_1_2_1_247_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_248_1","unstructured":"Dhruv Shah Alexander\u00a0T Toshev Sergey Levine and brian ichter. 2022. Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning. In ICLR."},{"key":"e_1_2_1_249_1","unstructured":"Noam Shazeer *Azalia Mirhoseini *Krzysztof Maziarz Andy Davis Quoc Le Geoffrey Hinton and Jeff Dean. 2017. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In ICLR."},{"key":"e_1_2_1_250_1","doi-asserted-by":"crossref","unstructured":"Jianhao Shen Yichun Yin Lin Li Lifeng Shang Xin Jiang Ming Zhang and Qun Liu. 2021. Generate & Rank: A Multi-task Framework for Math Word Problems. In EMNLP. ACL 2269\u20132279.","DOI":"10.18653\/v1\/2021.findings-emnlp.195"},{"key":"e_1_2_1_251_1","unstructured":"Yongliang Shen Kaitao Song Xu Tan Dongsheng Li Weiming Lu and Yueting Zhuang. 2023. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. In NeurIPS."},{"key":"e_1_2_1_252_1","unstructured":"Noah Shinn Federico Cassano Ashwin Gopinath Karthik\u00a0R Narasimhan and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. In NeurIPS."},{"key":"e_1_2_1_253_1","volume-title":"Progprompt: Generating situated robot task plans using large language models","author":"Singh Ishika","year":"2023","unstructured":"Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. 2023. Progprompt: Generating situated robot task plans using large language models. In ICRA. IEEE, 11523\u201311530."},{"key":"e_1_2_1_254_1","doi-asserted-by":"crossref","unstructured":"Mannat Singh Laura Gustafson Aaron Adcock Vinicius de Freitas\u00a0Reis Bugra Gedik Raj\u00a0Prateek Kosaraju Dhruv Mahajan Ross Girshick Piotr Doll\u00e1r and Laurens Van Der\u00a0Maaten. 2022. Revisiting weakly supervised pre-training of visual perception models. In CVPR. 804\u2013814.","DOI":"10.1109\/CVPR52688.2022.00088"},{"key":"e_1_2_1_255_1","volume-title":"et\u00a0al","author":"Singhal Karan","year":"2023","unstructured":"Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, et\u00a0al. 2023. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617(2023)."},{"key":"e_1_2_1_257_1","doi-asserted-by":"crossref","unstructured":"Chan\u00a0Hee Song Jiaman Wu Clayton Washington Brian\u00a0M. Sadler Wei-Lun Chao and Yu Su. 2023. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. In ICCV.","DOI":"10.1109\/ICCV51070.2023.00280"},{"key":"e_1_2_1_258_1","volume-title":"Preference ranking optimization for human alignment. AAAI","author":"Song Feifan","year":"2023","unstructured":"Feifan Song, Bowen Yu, Minghao Li, Haiyang Yu, Fei Huang, Yongbin Li, and Houfeng Wang. 2023. Preference ranking optimization for human alignment. AAAI (2023)."},{"key":"e_1_2_1_259_1","volume-title":"An Open Multilingual Graph of General Knowledge","author":"Speer Robyn","unstructured":"Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In AAAI. AAAI Press, 4444\u20134451."},{"key":"e_1_2_1_260_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463257"},{"key":"e_1_2_1_261_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.97"},{"key":"e_1_2_1_262_1","unstructured":"Jiankai Sun Yiqi Jiang Jianing Qiu Parth\u00a0Talpur Nobel Mykel Kochenderfer and Mac Schwager. 2023. Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model. In NeurIPS."},{"key":"e_1_2_1_263_1","unstructured":"Mingjie Sun Zhuang Liu Anna Bair and J\u00a0Zico Kolter. 2024. A Simple and Effective Pruning Approach for Large Language Models. In ICLR."},{"key":"e_1_2_1_264_1","volume-title":"et\u00a0al","author":"Suzgun Mirac","year":"2022","unstructured":"Mirac Suzgun, Nathan Scales, Nathanael Sch\u00e4rli, Sebastian Gehrmann, Yi Tay, Hyung\u00a0Won Chung, Aakanksha Chowdhery, Quoc\u00a0V Le, Ed\u00a0H Chi, Denny Zhou, et\u00a0al. 2022. Challenging big-bench tasks and whether chain-of-thought can solve them. ACL (2022)."},{"key":"e_1_2_1_265_1","volume-title":"ACL","author":"Talmor Alon","unstructured":"Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. In ACL. ACL, Minneapolis, Minnesota, 4149\u20134158."},{"key":"e_1_2_1_266_1","unstructured":"Chaofan Tao Lu Hou Wei Zhang Lifeng Shang Xin Jiang Qun Liu Ping Luo and Ngai Wong. 2022. Compression of Generative Pre-trained Language Models via Quantization. In ACL. ACL 4821\u20134836."},{"key":"e_1_2_1_267_1","first-page":"7","article-title":"Alpaca: A strong, replicable instruction-following model","volume":"3","author":"Taori Rohan","year":"2023","unstructured":"Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori\u00a0B Hashimoto. 2023. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models 3, 6 (2023), 7.","journal-title":"Stanford Center for Research on Foundation Models"},{"key":"e_1_2_1_268_1","volume-title":"et\u00a0al","author":"Team Gemini","year":"2023","unstructured":"Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew\u00a0M Dai, Anja Hauth, et\u00a0al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805(2023)."},{"key":"e_1_2_1_269_1","volume-title":"Multinet: Real-time joint semantic reasoning for autonomous driving. In 2018 IEEE intelligent vehicles symposium (IV)","author":"Teichmann Marvin","year":"2018","unstructured":"Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, and Raquel Urtasun. 2018. Multinet: Real-time joint semantic reasoning for autonomous driving. In 2018 IEEE intelligent vehicles symposium (IV). IEEE, 1013\u20131020."},{"key":"e_1_2_1_270_1","volume-title":"Propositional Reasoning via Neural Transformer Language Models. NeSy","author":"Tomasic Anthony","year":"2021","unstructured":"Anthony Tomasic, Oscar\u00a0J Romero, John Zimmerman, and Aaron Steinfeld. 2021. Propositional Reasoning via Neural Transformer Language Models. NeSy (2021)."},{"key":"e_1_2_1_271_1","unstructured":"Hugo Touvron et\u00a0al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv:2307.09288 \u00a0[cs.CL]"},{"key":"e_1_2_1_272_1","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971 \u00a0[cs.CL]"},{"key":"e_1_2_1_273_1","volume-title":"et\u00a0al","author":"Tsai Hsiang-Sheng","year":"2022","unstructured":"Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy Liu, Cheng-I Lai, Jiatong Shi, et\u00a0al. 2022. SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. In ACL. 8479\u20138492."},{"key":"e_1_2_1_274_1","unstructured":"Maria Tsimpoukelli Jacob Menick Serkan Cabi S.\u00a0M.\u00a0Ali Eslami Oriol Vinyals and Felix Hill. 2021. Multimodal Few-Shot Learning with Frozen Language Models. In NeurIPS."},{"key":"e_1_2_1_275_1","volume-title":"et\u00a0al","author":"Tu Tao","year":"2024","unstructured":"Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, et\u00a0al. 2024. Towards generalist biomedical ai. NEJM AI (2024)."},{"key":"e_1_2_1_276_1","doi-asserted-by":"crossref","unstructured":"Hsiao-Yu Tung Mingyu Ding Zhenfang Chen Daniel Bear Chuang Gan Joshua\u00a0B. Tenenbaum Daniel\u00a0LK Yamins Judith\u00a0E Fan and Kevin\u00a0A. Smith. 2023. Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties. In NeurIPS Datasets and Benchmarks Track.","DOI":"10.1167\/jov.23.9.5622"},{"key":"e_1_2_1_277_1","first-page":"75993","article-title":"On the planning abilities of large language models-a critical investigation","volume":"36","author":"Valmeekam Karthik","year":"2023","unstructured":"Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. 2023. On the planning abilities of large language models-a critical investigation. NeurIPS 36(2023), 75993\u201376005.","journal-title":"NeurIPS"},{"key":"e_1_2_1_278_1","doi-asserted-by":"publisher","DOI":"10.5555\/196108.196111"},{"key":"e_1_2_1_279_1","volume-title":"Mimicplay: Long-horizon imitation learning by watching human play. CoRL","author":"Wang Chen","year":"2023","unstructured":"Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, and Anima Anandkumar. 2023. Mimicplay: Long-horizon imitation learning by watching human play. CoRL (2023)."},{"key":"e_1_2_1_280_1","volume-title":"Voyager: An Open-Ended Embodied Agent with Large Language Models. TMLR","author":"Wang Guanzhi","year":"2024","unstructured":"Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024. Voyager: An Open-Ended Embodied Agent with Large Language Models. TMLR (2024)."},{"key":"e_1_2_1_281_1","unstructured":"Haiming Wang Huajian Xin Chuanyang Zheng Zhengying Liu Qingxing Cao Yinya Huang Jing Xiong Han Shi Enze Xie Jian Yin Zhenguo Li and Xiaodan Liang. 2024. LEGO-Prover: Neural Theorem Proving with Growing Libraries. In ICLR."},{"key":"e_1_2_1_282_1","volume-title":"et\u00a0al","author":"Wang Haiming","year":"2023","unstructured":"Haiming Wang, Ye Yuan, Zhengying Liu, Jianhao Shen, Yichun Yin, Jing Xiong, Enze Xie, Han Shi, Yujun Li, Lin Li, et\u00a0al. 2023. DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function. In ACL. 12632\u201312646."},{"key":"e_1_2_1_283_1","volume-title":"et\u00a0al","author":"Wang Jiaqi","year":"2023","unstructured":"Jiaqi Wang, Zhengliang Liu, Lin Zhao, Zihao Wu, Chong Ma, Sigang Yu, Haixing Dai, Qiushi Yang, Yiheng Liu, Songyao Zhang, et\u00a0al. 2023. Review of Large Vision Models and Visual Prompt Engineering. arXiv preprint arXiv:2307.00855(2023)."},{"key":"e_1_2_1_284_1","volume-title":"Roy Ka-Wei Lee, and Ee-Peng Lim","author":"Wang Lei","year":"2023","unstructured":"Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, and Ee-Peng Lim. 2023. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. ACL (2023)."},{"key":"e_1_2_1_285_1","volume-title":"Learning to Retrieve In-Context Examples for Large Language Models. EACL","author":"Wang Liang","year":"2024","unstructured":"Liang Wang, Nan Yang, and Furu Wei. 2024. Learning to Retrieve In-Context Examples for Large Language Models. EACL (2024)."},{"key":"e_1_2_1_286_1","doi-asserted-by":"crossref","unstructured":"Peiyi Wang Lei Li Zhihong Shao Runxin Xu Damai Dai Yifei Li Deli Chen Yu Wu and Zhifang Sui. 2024. Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations. In ACL. ACL 9426\u20139439.","DOI":"10.18653\/v1\/2024.acl-long.510"},{"key":"e_1_2_1_287_1","volume-title":"ScienceWorld: Is your Agent Smarter than a 5th Grader?EMNLP","author":"Wang Ruoyao","year":"2022","unstructured":"Ruoyao Wang, Peter Jansen, Marc-Alexandre C\u00f4t\u00e9, and Prithviraj Ammanabrolu. 2022. ScienceWorld: Is your Agent Smarter than a 5th Grader?EMNLP (2022)."},{"key":"e_1_2_1_288_1","volume-title":"SCIBENCH: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. In The 3rd Workshop on Mathematical Reasoning and AI at NeurIPS\u201923","author":"Wang Xiaoxuan","year":"2023","unstructured":"Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun Loomba, Shichang Zhang, Yizhou Sun, and Wei Wang. 2023. SCIBENCH: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. In The 3rd Workshop on Mathematical Reasoning and AI at NeurIPS\u201923."},{"key":"e_1_2_1_289_1","unstructured":"Xuezhi Wang Jason Wei Dale Schuurmans Quoc\u00a0V Le Ed\u00a0H. Chi Sharan Narang Aakanksha Chowdhery and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In ICLR."},{"key":"e_1_2_1_290_1","doi-asserted-by":"crossref","unstructured":"Yizhong Wang Yeganeh Kordi Swaroop Mishra Alisa Liu Noah\u00a0A. Smith Daniel Khashabi and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Language Model with Self Generated Instructions. In ACL.","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"e_1_2_1_291_1","volume-title":"EMNLP","author":"Wang Yan","unstructured":"Yan Wang, Xiaojiang Liu, and Shuming Shi. 2017. Deep Neural Solver for Math Word Problems. In EMNLP. ACL, Copenhagen, Denmark, 845\u2013854."},{"key":"e_1_2_1_292_1","volume-title":"et\u00a0al","author":"Wang Yizhong","year":"2022","unstructured":"Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut\u00a0Selvan Dhanasekaran, Atharva Naik, David Stap, et\u00a0al. 2022. Super-NaturalInstructions:Generalization via Declarative Instructions on 1600+ Tasks. In EMNLP."},{"key":"e_1_2_1_293_1","volume-title":"NEWTON: Are Large Language Models Capable of Physical Reasoning?. In EMNLP.","author":"Wang Yi\u00a0Ru","year":"2023","unstructured":"Yi\u00a0Ru Wang, Jiafei Duan, Dieter Fox, and Siddhartha Srinivasa. 2023. NEWTON: Are Large Language Models Capable of Physical Reasoning?. In EMNLP."},{"key":"e_1_2_1_294_1","volume-title":"Structured Pruning of Large Language Models","author":"Wang Ziheng","unstructured":"Ziheng Wang, Jeremy Wohlwend, and Tao Lei. 2020. Structured Pruning of Large Language Models. In EMNLP. Association for Computational Linguistics."},{"key":"e_1_2_1_295_1","volume-title":"et\u00a0al","author":"Wang Zekun","year":"2023","unstructured":"Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong\u00a0Yuan Sim, Xiuying Chen, et\u00a0al. 2023. Interactive natural language processing. arXiv preprint arXiv:2305.13246(2023)."},{"key":"e_1_2_1_296_1","volume-title":"et\u00a0al","author":"Watson L","year":"2023","unstructured":"Joseph\u00a0L Watson, David Juergens, Nathaniel\u00a0R Bennett, Brian\u00a0L Trippe, Jason Yim, Helen\u00a0E Eisenach, Woody Ahern, Andrew\u00a0J Borst, Robert\u00a0J Ragotte, Lukas\u00a0F Milles, et\u00a0al. 2023. De novo design of protein structure and function with RFdiffusion. Nature 620, 7976 (2023), 1089\u20131100."},{"key":"e_1_2_1_297_1","unstructured":"Jason Wei Maarten Bosma Vincent Zhao Kelvin Guu Adams\u00a0Wei Yu Brian Lester Nan Du Andrew\u00a0M. Dai and Quoc\u00a0V Le. 2022. Finetuned Language Models are Zero-Shot Learners. In ICLR."},{"key":"e_1_2_1_298_1","volume-title":"Emergent Abilities of Large Language Models. TMLR","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed\u00a0H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. TMLR (2022)."},{"key":"e_1_2_1_299_1","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc\u00a0V Le, Denny Zhou, et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 35(2022), 24824\u201324837.","journal-title":"NeurIPS"},{"key":"e_1_2_1_300_1","unstructured":"Jason Weston and Sainbayar Sukhbaatar. 2023. System 2 Attention (is something you might need too). arxiv:2311.11829 \u00a0[cs.CL]"},{"key":"e_1_2_1_301_1","volume-title":"Formal methods: Practice and experience. ACM computing surveys (CSUR) 41, 4","author":"Woodcock Jim","year":"2009","unstructured":"Jim Woodcock, Peter\u00a0Gorm Larsen, Juan Bicarregui, and John Fitzgerald. 2009. Formal methods: Practice and experience. ACM computing surveys (CSUR) 41, 4 (2009), 1\u201336."},{"key":"e_1_2_1_302_1","unstructured":"Chenfei Wu Shengming Yin Weizhen Qi Xiaodong Wang Zecheng Tang and Nan Duan. 2023. Visual ChatGPT: Talking Drawing and Editing with Visual Foundation Models. arxiv:2303.04671 \u00a0[cs.CV]"},{"key":"e_1_2_1_303_1","volume-title":"LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. ACL","author":"Wu Minghao","year":"2024","unstructured":"Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, and Alham\u00a0Fikri Aji. 2024. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. ACL (2024)."},{"key":"e_1_2_1_304_1","volume-title":"et\u00a0al","author":"Wu Siwei","year":"2024","unstructured":"Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, et\u00a0al. 2024. A Comparative Study on Reasoning Patterns of OpenAI\u2019s o1 Model. arXiv preprint arXiv:2410.13639(2024)."},{"key":"e_1_2_1_305_1","volume-title":"MOFI: Learning Image Representations from Noisy Entity Annotated Images. In ICLR.","author":"Wu Wentao","year":"2024","unstructured":"Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, and Yinfei Yang. 2024. MOFI: Learning Image Representations from Noisy Entity Annotated Images. In ICLR."},{"key":"e_1_2_1_306_1","unstructured":"Yuhuai Wu Albert\u00a0Qiaochu Jiang Wenda Li Markus\u00a0Norman Rabe Charles\u00a0E Staats Mateja Jamnik and Christian Szegedy. 2022. Autoformalization with Large Language Models. In NeurIPS."},{"key":"e_1_2_1_307_1","volume-title":"Slotformer: Unsupervised visual dynamics simulation with object-centric models. arXiv preprint arXiv:2210.05861(2022).","author":"Wu Ziyi","year":"2022","unstructured":"Ziyi Wu, Nikita Dvornik, Klaus Greff, Thomas Kipf, and Animesh Garg. 2022. Slotformer: Unsupervised visual dynamics simulation with object-centric models. arXiv preprint arXiv:2210.05861(2022)."},{"key":"e_1_2_1_308_1","volume-title":"Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. In NAACL. ACL","author":"Wu Zhaofeng","year":"2024","unstructured":"Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Aky\u00fcrek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, and Yoon Kim. 2024. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. In NAACL. ACL, 1819\u20131862."},{"key":"e_1_2_1_309_1","unstructured":"Guangxuan Xiao Yuandong Tian Beidi Chen Song Han and Mike Lewis. 2024. Efficient Streaming Language Models with Attention Sinks. In ICLR."},{"key":"e_1_2_1_310_1","unstructured":"Yuxi Xie Kenji Kawaguchi Yiran Zhao Xu Zhao Min-Yen Kan Junxian He and Qizhe Xie. 2023. Self-Evaluation Guided Beam Search for Reasoning. In NeurIPS."},{"key":"e_1_2_1_311_1","doi-asserted-by":"crossref","unstructured":"Zhipeng Xie and Shichao Sun. 2019. A Goal-Driven Tree-Structured Neural Model for Math Word Problems.. In IJCAI. 5299\u20135305.","DOI":"10.24963\/ijcai.2019\/736"},{"key":"e_1_2_1_312_1","unstructured":"Jing Xiong Zixuan Li Chuanyang Zheng Zhijiang Guo Yichun Yin Enze Xie Zhicheng YANG Qingxing Cao Haiming Wang Xiongwei Han Jing Tang Chengming Li and Xiaodan Liang. 2024. DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning. In ICLR."},{"key":"e_1_2_1_313_1","volume-title":"TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models. In EMNLP.","author":"Xiong Jing","year":"2023","unstructured":"Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, Yinya Huang, Chuanyang Zheng, Xiaodan Liang, Ming Zhang, and Qun Liu. 2023. TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models. In EMNLP."},{"key":"e_1_2_1_314_1","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 4762\u20134772","author":"Xu Liang","year":"2020","unstructured":"Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan. 2020. CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 4762\u20134772."},{"key":"e_1_2_1_315_1","first-page":"100179","article-title":"Artificial intelligence: A powerful paradigm for scientific research","volume":"2","author":"Xu Yongjun","year":"2021","unstructured":"Yongjun Xu, Xin Liu, Xin Cao, Changping Huang, Enke Liu, Sen Qian, Xingchen Liu, Yanjun Wu, Fengliang Dong, Cheng-Wei Qiu, Junjun Qiu, Keqin Hua, Wentao Su, Jian Wu, Huiyu Xu, Yong Han, Chenguang Fu, Zhigang Yin, Miao Liu, Ronald Roepman, Sabine Dietmann, Marko Virta, Fredrick Kengara, Ze Zhang, Lifu Zhang, Taolan Zhao, Ji Dai, Jialiang Yang, Liang Lan, Ming Luo, Zhaofeng Liu, Tao An, Bin Zhang, Xiao He, Shan Cong, Xiaohong Liu, Wei Zhang, James\u00a0P. Lewis, James\u00a0M. Tiedje, Qi Wang, Zhulin An, Fei Wang, Libo Zhang, Tao Huang, Chuan Lu, Zhipeng Cai, Fang Wang, and Jiabao Zhang. 2021. Artificial intelligence: A powerful paradigm for scientific research. The Innovation 2, 4 (2021), 100179.","journal-title":"The Innovation"},{"key":"e_1_2_1_316_1","doi-asserted-by":"crossref","unstructured":"Fuzhao Xue Ziji Shi Futao Wei Yuxuan Lou Yong Liu and Yang You. 2022. Go wider instead of deeper. In AAAI Vol.\u00a0 36. 8779\u20138787.","DOI":"10.1609\/aaai.v36i8.20858"},{"key":"e_1_2_1_317_1","doi-asserted-by":"crossref","unstructured":"Haoran Yang Yan Wang Piji Li Wei Bi Wai Lam and Chen Xu. 2023. Bridging the Gap between Pre-Training and Fine-Tuning for Commonsense Generation. 376\u2013383.","DOI":"10.18653\/v1\/2023.findings-eacl.28"},{"key":"e_1_2_1_318_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_319_1","unstructured":"Kaiyu Yang Aidan\u00a0M Swope Alex Gu Rahul Chalamala Peiyang Song Shixing Yu Saad Godil Ryan Prenger and Anima Anandkumar. 2023. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In NeurIPS Datasets and Benchmarks Track."},{"key":"e_1_2_1_320_1","unstructured":"Rui Yang Lin Song Yanwei Li Sijie Zhao Yixiao Ge Xiu Li and Ying Shan. 2023. GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction. In NeurIPS."},{"key":"e_1_2_1_321_1","volume-title":"Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung yi Lee.","author":"Chi Po-Han","year":"2021","unstructured":"Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I\u00a0Jeff Lai, Kushal Lakhotia, Yist\u00a0Y. Lin, Andy\u00a0T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung yi Lee. 2021. SUPERB: Speech Processing Universal PERformance Benchmark. In INTERSPEECH. 1194\u20131198."},{"key":"e_1_2_1_322_1","unstructured":"Zonglin Yang Xinya Du Rui Mao Jinjie Ni and Erik Cambria. 2023. Logical Reasoning over Natural Language as Knowledge Representation: A Survey. arXiv preprint arXiv:2303.12023(2023)."},{"key":"e_1_2_1_323_1","unstructured":"Shunyu Yao Dian Yu Jeffrey Zhao Izhak Shafran Thomas\u00a0L. Griffiths Yuan Cao and Karthik\u00a0R Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In NeurIPS."},{"key":"e_1_2_1_324_1","unstructured":"Shunyu Yao Jeffrey Zhao Dian Yu Nan Du Izhak Shafran Karthik Narasimhan and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In ICLR."},{"key":"e_1_2_1_325_1","doi-asserted-by":"crossref","unstructured":"Yao Yao Zuchao Li and Hai Zhao. 2024. GoT: Effective Graph-of-Thought Reasoning in Language Models. In NAACL. ACL 2901\u20132921.","DOI":"10.18653\/v1\/2024.findings-naacl.183"},{"key":"e_1_2_1_326_1","unstructured":"Jiacheng Ye Zhiyong Wu Jiangtao Feng Tao Yu and Lingpeng Kong. 2023. Compositional exemplars for in-context learning. In ICML. JMLR.org Article 1662 16\u00a0pages."},{"key":"e_1_2_1_327_1","volume-title":"Crossfit: A few-shot learning challenge for cross-task generalization in nlp. EMNLP","author":"Ye Qinyuan","year":"2021","unstructured":"Qinyuan Ye, Bill\u00a0Yuchen Lin, and Xiang Ren. 2021. Crossfit: A few-shot learning challenge for cross-task generalization in nlp. EMNLP (2021)."},{"key":"e_1_2_1_328_1","doi-asserted-by":"crossref","unstructured":"Xi Ye Srinivasan Iyer Asli Celikyilmaz Veselin Stoyanov Greg Durrett and Ramakanth Pasunuru. 2023. Complementary Explanations for Effective In-Context Learning. In ACL. ACL 4469\u20134484.","DOI":"10.18653\/v1\/2023.findings-acl.273"},{"key":"e_1_2_1_329_1","volume-title":"Reasoning in Reasoning: A Hierarchical Framework for (Better and Faster) Neural Theorem Proving. In The 4th Workshop on Mathematical Reasoning and AI at NeurIPS\u201924","author":"Ye Ziyu","year":"2024","unstructured":"Ziyu Ye, Jiacheng Chen, Jonathan Light, Yifei Wang, Jiankai Sun, Guohao Li, Mac Schwager, Philip Torr, Yuxin Chen, Kaiyu Yang, Yisong Yue, and Ziniu Hu. 2024. Reasoning in Reasoning: A Hierarchical Framework for (Better and Faster) Neural Theorem Proving. In The 4th Workshop on Mathematical Reasoning and AI at NeurIPS\u201924."},{"key":"e_1_2_1_330_1","volume-title":"CLEVRER: Collision Events for Video Representation and Reasoning. In ICLR.","author":"Kexin","year":"2020","unstructured":"Kexin Yi*, Chuang Gan*, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua\u00a0B. Tenenbaum. 2020. CLEVRER: Collision Events for Video Representation and Reasoning. In ICLR."},{"key":"e_1_2_1_331_1","volume-title":"Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation. In EMNLP. ACL, 4031\u20134047.","author":"Yin Da","year":"2023","unstructured":"Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, and Kai-Wei Chang. 2023. Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation. In EMNLP. ACL, 4031\u20134047."},{"key":"e_1_2_1_332_1","unstructured":"Shukang Yin Chaoyou Fu Sirui Zhao Ke Li Xing Sun Tong Xu and Enhong Chen. 2024. A survey on multimodal large language models. National Science Review(2024) nwae403."},{"key":"e_1_2_1_333_1","unstructured":"Zhangyue Yin Qiushi Sun Cheng Chang Qipeng Guo Junqi Dai Xuanjing Huang and Xipeng Qiu. 2023. Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication. In EMNLP."},{"key":"e_1_2_1_334_1","volume-title":"ACL","author":"Yin Zhangyue","unstructured":"Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, and Xuanjing Huang. 2023. Do Large Language Models Know What They Don\u2019t Know?. In ACL. ACL, Toronto, Canada, 8653\u20138665."},{"key":"e_1_2_1_335_1","volume-title":"Statler: State-Maintaining Language Models for Embodied Reasoning. In ICRA.","author":"Yoneda Takuma","year":"2024","unstructured":"Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, and Matthew\u00a0R. Walter. 2024. Statler: State-Maintaining Language Models for Embodied Reasoning. In ICRA."},{"key":"e_1_2_1_336_1","unstructured":"Longhui Yu Weisen Jiang Han Shi Jincheng YU Zhengying Liu Yu Zhang James Kwok Zhenguo Li Adrian Weller and Weiyang Liu. 2024. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models. In ICLR."},{"key":"e_1_2_1_337_1","volume-title":"PACS: A Dataset for Physical Audiovisual CommonSense Reasoning. ECCV","author":"Yu Samuel","year":"2022","unstructured":"Samuel Yu, Peter Wu, Paul\u00a0Pu Liang, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2022. PACS: A Dataset for Physical Audiovisual CommonSense Reasoning. ECCV (2022)."},{"key":"e_1_2_1_338_1","volume-title":"RRHF: Rank Responses to Align Language Models with Human Feedback. In NeurIPS.","author":"Yuan Hongyi","year":"2023","unstructured":"Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, and Fei Huang. 2023. RRHF: Rank Responses to Align Language Models with Human Feedback. In NeurIPS."},{"key":"e_1_2_1_339_1","unstructured":"Xiang Yue Xingwei Qu Ge Zhang Yao Fu Wenhao Huang Huan Sun Yu Su and Wenhu Chen. 2024. MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning. In ICLR."},{"key":"e_1_2_1_340_1","volume-title":"Hellaswag: Can a machine really finish your sentence?arXiv preprint arXiv:1905.07830(2019).","author":"Zellers Rowan","year":"2019","unstructured":"Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. Hellaswag: Can a machine really finish your sentence?arXiv preprint arXiv:1905.07830(2019)."},{"key":"e_1_2_1_341_1","unstructured":"Aohan Zeng Xiao Liu Zhengxiao Du Zihan Wang Hanyu Lai Ming Ding Zhuoyi Yang Yifan Xu Wendi Zheng Xiao Xia Weng\u00a0Lam Tam Zixuan Ma Yufei Xue Jidong Zhai Wenguang Chen Zhiyuan Liu Peng Zhang Yuxiao Dong and Jie Tang. 2023. GLM-130B: An Open Bilingual Pre-trained Model. In ICLR."},{"key":"e_1_2_1_342_1","unstructured":"Hongxin Zhang Weihua Du Jiaming Shan Qinhong Zhou Yilun Du Joshua\u00a0B. Tenenbaum Tianmin Shu and Chuang Gan. 2024. Building Cooperative Embodied Agents Modularly with Large Language Models. In ICLR."},{"key":"e_1_2_1_343_1","volume-title":"et\u00a0al","author":"Zhang Kai","year":"2024","unstructured":"Kai Zhang, Rong Zhou, Eashan Adhikarla, Zhiling Yan, Yixin Liu, Jun Yu, Zhengliang Liu, Xun Chen, Brian\u00a0D Davison, Hui Ren, et\u00a0al. 2024. A generalist vision\u2013language foundation model for diverse biomedical tasks. Nature Medicine (2024), 1\u201313."},{"key":"e_1_2_1_344_1","volume-title":"et\u00a0al","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi\u00a0Victoria Lin, et\u00a0al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068(2022)."},{"key":"e_1_2_1_345_1","volume-title":"Prompt Highlighter: Interactive Control for Multi-Modal LLMs. CVPR","author":"Zhang Yuechen","year":"2024","unstructured":"Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, and Jiaya Jia. 2024. Prompt Highlighter: Interactive Control for Multi-Modal LLMs. CVPR (2024)."},{"key":"e_1_2_1_346_1","unstructured":"Zhuosheng Zhang Aston Zhang Mu Li and Alex Smola. 2023. Automatic Chain of Thought Prompting in Large Language Models. In ICLR."},{"key":"e_1_2_1_347_1","doi-asserted-by":"crossref","unstructured":"Hongyu Zhao Kangrui Wang Mo Yu and Hongyuan Mei. 2023. Explicit Planning Helps Language Models in Logical Reasoning. In EMNLP. ACL 11155\u201311173.","DOI":"10.18653\/v1\/2023.emnlp-main.688"},{"key":"e_1_2_1_348_1","doi-asserted-by":"crossref","unstructured":"James\u00a0Xu Zhao Yuxi Xie Kenji Kawaguchi Junxian He and Michael\u00a0Qizhe Xie. 2023. Automatic Model Selection with Large Language Models for Reasoning. In EMNLP.","DOI":"10.18653\/v1\/2023.findings-emnlp.55"},{"key":"e_1_2_1_349_1","unstructured":"Xueliang Zhao Wenda Li and Lingpeng Kong. 2023. Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving. arXiv preprint arXiv:2305.16366(2023)."},{"key":"e_1_2_1_350_1","unstructured":"Yao Zhao Mikhail Khalman Rishabh Joshi Shashi Narayan Mohammad Saleh and Peter\u00a0J Liu. 2022. Calibrating Sequence likelihood Improves Conditional Language Generation. In ICLR."},{"key":"e_1_2_1_351_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_352_1","volume-title":"RSS 2023 Workshop on Learning for Task and Motion Planning.","author":"Zhao Zirui","year":"2023","unstructured":"Zirui Zhao, Wee\u00a0Sun Lee, and David Hsu. 2023. Large Language Models as Commonsense Knowledge for Large-Scale Task Planning. In RSS 2023 Workshop on Learning for Task and Motion Planning."},{"key":"e_1_2_1_353_1","volume-title":"et\u00a0al","author":"Zheng Chuanyang","year":"2024","unstructured":"Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, et\u00a0al. 2024. DAPE V2: Process Attention Score as Feature Map for Length Extrapolation. arXiv preprint arXiv:2410.04798(2024)."},{"key":"e_1_2_1_354_1","volume-title":"Lyra: Orchestrating Dual Correction in Automated Theorem Proving. TMLR","author":"Zheng Chuanyang","year":"2024","unstructured":"Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, and Yu Li. 2024. Lyra: Orchestrating Dual Correction in Automated Theorem Proving. TMLR (2024)."},{"key":"e_1_2_1_355_1","unstructured":"Kunhao Zheng Jesse\u00a0Michael Han and Stanislas Polu. 2022. miniF2F: a cross-system benchmark for formal Olympiad-level mathematics. In ICLR."},{"key":"e_1_2_1_356_1","unstructured":"Denny Zhou Nathanael Sch\u00e4rli Le Hou Jason Wei Nathan Scales Xuezhi Wang Dale Schuurmans Claire Cui Olivier Bousquet Quoc\u00a0V Le and Ed\u00a0H. Chi. 2023. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In ICLR."},{"key":"e_1_2_1_357_1","doi-asserted-by":"crossref","unstructured":"Lipu Zhou Shuaixiang Dai and Liwei Chen. 2015. Learn to Solve Algebra Word Problems Using Quadratic Programming. In EMNLP.","DOI":"10.18653\/v1\/D15-1096"},{"key":"e_1_2_1_358_1","volume-title":"Moucheng Xu, Mateo\u00a0G Lozano, Peter Woodward-Court, et\u00a0al.","author":"Zhou Yukun","year":"2023","unstructured":"Yukun Zhou, Mark\u00a0A Chia, Siegfried\u00a0K Wagner, Murat\u00a0S Ayhan, Dominic\u00a0J Williamson, Robbert\u00a0R Struyven, Timing Liu, Moucheng Xu, Mateo\u00a0G Lozano, Peter Woodward-Court, et\u00a0al. 2023. A foundation model for generalizable disease detection from retinal images. Nature (2023), 1\u20138."},{"key":"e_1_2_1_359_1","unstructured":"Xunyu Zhu Jian Li Yong Liu Can Ma and Weiping Wang. 2023. A Survey on Model Compression for Large Language Models. arxiv:2308.07633 \u00a0[cs.CL]"},{"key":"e_1_2_1_360_1","volume-title":"Conf. on Robot Learning(PMLR, Vol.\u00a0 229)","author":"Zitkovich Brianna","year":"2023","unstructured":"Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, Quan Vuong, Vincent Vanhoucke, Huong Tran, Radu Soricut, Anikait Singh, Jaspiar Singh, Pierre Sermanet, Pannag\u00a0R. Sanketi, Grecia Salazar, Michael\u00a0S. Ryoo, Krista Reymann, Kanishka Rao, Karl Pertsch, Igor Mordatch, Henryk Michalewski, Yao Lu, Sergey Levine, Lisa Lee, Tsang-Wei\u00a0Edward Lee, Isabel Leal, Yuheng Kuang, Dmitry Kalashnikov, Ryan Julian, Nikhil\u00a0J. Joshi, Alex Irpan, Brian Ichter, Jasmine Hsu, Alexander Herzog, Karol Hausman, Keerthana Gopalakrishnan, Chuyuan Fu, Pete Florence, Chelsea Finn, Kumar\u00a0Avinava Dubey, Danny Driess, Tianli Ding, Krzysztof\u00a0Marcin Choromanski, Xi Chen, Yevgen Chebotar, Justice Carbajal, Noah Brown, Anthony Brohan, Montserrat\u00a0Gonzalez Arenas, and Kehang Han. 2023. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. In Conf. on Robot Learning(PMLR, Vol.\u00a0 229). PMLR, 2165\u20132183."},{"key":"e_1_2_1_361_1","doi-asserted-by":"crossref","unstructured":"Zhu Ziyu Ma Xiaojian Chen Yixin Deng Zhidong Huang Siyuan and Li Qing. 2023. 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment. In ICCV.","DOI":"10.1109\/ICCV51070.2023.00272"},{"key":"e_1_2_1_362_1","volume-title":"MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation","author":"Zuo Simiao","unstructured":"Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, and Weizhu Chen. 2022. MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation. In NAACL. Association for Computational Linguistics, 1610\u20131623."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729218","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T10:58:38Z","timestamp":1744369118000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729218"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,11]]},"references-count":360,"alternative-id":["10.1145\/3729218"],"URL":"https:\/\/doi.org\/10.1145\/3729218","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,11]]},"assertion":[{"value":"2024-03-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"3729218"}}