{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T10:32:05Z","timestamp":1777285925746,"version":"3.51.4"},"reference-count":168,"publisher":"Association for Computing Machinery (ACM)","issue":"11","funder":[{"DOI":"10.13039\/501100010446","name":"Institute for Basic Science","doi-asserted-by":"publisher","award":["IBS-R029-C2, IBS-R030-C1"],"award-info":[{"award-number":["IBS-R029-C2, IBS-R030-C1"]}],"id":[{"id":"10.13039\/501100010446","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["RS-2024-00397681"],"award-info":[{"award-number":["RS-2024-00397681"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100014188","name":"Ministry of Science and ICT, South Korea","doi-asserted-by":"publisher","award":["N10250153"],"award-info":[{"award-number":["N10250153"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,8,30]]},"abstract":"<jats:p>Recently, with advances in Large Language Models(LLMs), robot navigation models have demonstrated superior generalization capabilities across environment perception, decision-making, reasoning, planning, instruction understanding, and human-robot interaction. In this article, we systematically review recent LLM-based robot navigation research articles and categorize them into a novel taxonomy comprising perception, planning, control, interaction, and coordination. We also present an overview of the principal datasets, simulations, and metrics used in robot navigation, analyzing the distinctive characteristics of the datasets and the performance of the main LLM-based methods. Furthermore, we discuss the challenges hindering the integration of LLMs into robot navigation and provide opportunities and potential directions for future development.<\/jats:p>","DOI":"10.1145\/3802539","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T21:08:39Z","timestamp":1773781719000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Robot Navigation via Foundation Language Models: A Review"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-6037-7196","authenticated-orcid":false,"given":"Haotian","family":"Pan","sequence":"first","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7777-4035","authenticated-orcid":false,"given":"Shibo","family":"Huang","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2373-8799","authenticated-orcid":false,"given":"Jian","family":"Yang","sequence":"additional","affiliation":[{"name":"Information Engineering University","place":["Zhengzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0506-9707","authenticated-orcid":false,"given":"Jinpeng","family":"Mi","sequence":"additional","affiliation":[{"name":"Institute of Machine Intelligence, University of Shanghai for Science and Technology","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7873-1554","authenticated-orcid":false,"given":"Ke","family":"Li","sequence":"additional","affiliation":[{"name":"Information Engineering University","place":["Zhengzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2683-1879","authenticated-orcid":false,"given":"Xiong","family":"You","sequence":"additional","affiliation":[{"name":"Information Engineering University","place":["Zhengzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9442-9366","authenticated-orcid":false,"given":"Peidong","family":"Liang","sequence":"additional","affiliation":[{"name":"Fujian (Quanzhou)-HIT Research Institute of Engineering and Technology","place":["Quanzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2523-0941","authenticated-orcid":false,"given":"Jinbo","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Shanghai for Science and Technology","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4872-2404","authenticated-orcid":false,"given":"Yingjie","family":"Liu","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7156-1724","authenticated-orcid":false,"given":"Jianfeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8281-9349","authenticated-orcid":false,"given":"Muyu","family":"Wang","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6218-603X","authenticated-orcid":false,"given":"Jie","family":"Yang","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5000-2483","authenticated-orcid":false,"given":"Xinyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2421-3874","authenticated-orcid":false,"given":"Lijun","family":"Zhao","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology","place":["Harbin, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3922-0989","authenticated-orcid":false,"given":"Mingsong","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Embedded Software and Systems, East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2589-0164","authenticated-orcid":false,"given":"Jie","family":"Zhou","sequence":"additional","affiliation":[{"name":"East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9566-7814","authenticated-orcid":false,"given":"Xian","family":"Wei","sequence":"additional","affiliation":[{"name":"software engineering institute, East China Normal University","place":["shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,25]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1098\/rstb.2021.0447"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJIES.2022.3179617"},{"key":"e_1_3_1_4_2","first-page":"944","volume-title":"Proceedings of the National Conference on Artificial Intelligence","author":"Thrun Sebastian","year":"1996","unstructured":"Sebastian Thrun and Arno B\u00fccken. 1996. Integrating grid-based and topological maps for mobile robot navigation. In Proceedings of the National Conference on Artificial Intelligence. 944\u2013951."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(98)00023-X"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","unstructured":"Yoram Koren and Johann Borenstein. 1991. Potential field methods and their inherent limitations for mobile robot navigation. In Proceedings of the IEEE International Conference on Robotics and Automation. 1398\u20131404. DOI:10.1109\/ROBOT.1991.131606","DOI":"10.1109\/ROBOT.1991.131606"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.30720"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/70.88147"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1108\/01439910010378879"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10161227"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-022-10039-8"},{"key":"e_1_3_1_12_2","first-page":"1877","volume-title":"Proceedings of the Advances in Neural Information Processing Systems.","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems.Curran Associates, Inc., 1877\u20131901."},{"key":"e_1_3_1_13_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers). Association for Computational Linguistics 4171\u20134186."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10611462"},{"key":"e_1_3_1_15_2","unstructured":"OpenAI Matthias Plappert Raul Sampedro Tao Xu Ilge Akkaya Vineet Kosaraju Peter Welinder Ruben D\u2019Sa Arthur Petron Henrique P. d O. Pinto Alex Paino Hyeonwoo Noh Lilian Weng Qiming Yuan Casey Chu and Wojciech Zaremba. 2021. Asymmetric self-play for automatic goal discovery in robotic manipulation. arXiv preprint arXiv:2101.04882 (2021)."},{"key":"e_1_3_1_16_2","unstructured":"Jinzhou Lin Han Gao Xuxiang Feng Rongtao Xu Changwei Wang Man Zhang Li Guo and Shibiao Xu. 2023. The development of LLMs for embodied navigation. arXiv preprint arXiv:2311.00530 (2023)."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jai.2024.12.003"},{"issue":"240","key":"e_1_3_1_18_2","first-page":"1","article-title":"Palm: Scaling language modeling with pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et\u00a0al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1\u2013113.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_19_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_1_20_2","unstructured":"OpenAI Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et\u00a0al. 2023. Gpt-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter Fei Xia Ed Chi Quoc V. Le and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022) 24824\u201324837. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/hash\/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html","DOI":"10.52202\/068431-1800"},{"key":"e_1_3_1_22_2","unstructured":"Michael Ahn Anthony Brohan Noah Brown Yevgen Chebotar Omar Cortes Byron David Chelsea Finn Chuyuan Fu Keerthana Gopalakrishnan Karol Hausman Alex Herzog Daniel Ho Jasmine Hsu Julian Ibarz Brian Ichter Alex Irpan Eric Jang Rosario Jauregui Ruano Kyle Jeffrey Sally Jesmonth Nikhil J Joshi Ryan Julian Dmitry Kalashnikov Yuheng Kuang Kuang-Huei Lee Sergey Levine Yao Lu Linda Luu Carolina Parada Peter Pastor Jornell Quiambao Kanishka Rao Jarek Rettinghouse Diego Reyes Pierre Sermanet Nicolas Sievers Clayton Tan Alexander Toshev Vincent Vanhoucke Fei Xia Ted Xiao Peng Xu Sichun Xu Mengyuan Yan and Andy Zeng. 2022. Do as I can not as I say: Grounding language in robotic affordances. arXiv:2204.01691. Retrieved from https:\/\/arxiv.org\/abs\/2204.01691"},{"key":"e_1_3_1_23_2","unstructured":"Danny Driess Fei Xia Mehdi S. M. Sajjadi Corey Lynch Aakanksha Chowdhery Brian Ichter Ayzaan Wahid Jonathan Tompson Quan Vuong Tianhe Yu Wenlong Huang Yevgen Chebotar Pierre Sermanet Daniel Duckworth Sergey Levine Vincent Vanhoucke Karol Hausman Marc Toussaint Klaus Greff Andy Zeng Igor Mordatch and Pete Florence. 2023. Palm-e: An embodied multimodal language model. arXiv:2303.03378. Retrieved from https:\/\/arxiv.org\/abs\/2303.03378"},{"key":"e_1_3_1_24_2","first-page":"9118","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Huang Wenlong","year":"2022","unstructured":"Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Proceedings of the International Conference on Machine Learning. PMLR, 9118\u20139147."},{"key":"e_1_3_1_25_2","unstructured":"Wenlong Huang Chen Wang Ruohan Zhang Yunzhu Li Jiajun Wu and Li Fei-Fei. 2023. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv:2307.05973. Retrieved from https:\/\/arxiv.org\/abs\/2307.05973"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160591"},{"key":"e_1_3_1_27_2","unstructured":"Guanzhi Wang Yuqi Xie Yunfan Jiang Ajay Mandlekar Chaowei Xiao Yuke Zhu Linxi Fan and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv:2305.16291. Retrieved from https:\/\/arxiv.org\/abs\/2305.16291"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008806205438"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2012.6225199"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-017-0806-2"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10339-011-0404-1"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1177\/1729881419839596"},{"key":"e_1_3_1_33_2","volume-title":"Autonomous Mobile Robots: Sensing, Control, Decision Making and Applications","author":"Lewis Frank L.","year":"2018","unstructured":"Frank L. Lewis and Shuzhi Sam Ge. 2018. Autonomous Mobile Robots: Sensing, Control, Decision Making and Applications. CRC Press."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1994.365739"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2001.976276"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1177\/1729881417750785"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.11591\/ijeecs.v20.i1.pp500-509"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-0417(03)00007-X"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2006.282501"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/1207\/1\/012018"},{"issue":"1","key":"e_1_3_1_41_2","first-page":"8269698","article-title":"An Overview of Nature-Inspired, Conventional, and Hybrid Methods of Autonomous Vehicle Path Planning","volume":"2018","author":"Ayawli Ben Beklisi Kwame","year":"2018","unstructured":"Ben Beklisi Kwame Ayawli, Ryad Chellali, Albert Yaw Appiah, and Frimpong Kyeremeh. 2018. An Overview of Nature-Inspired, Conventional, and Hybrid Methods of Autonomous Vehicle Path Planning. Journal of Advanced Transportation 2018, 1 (2018), 8269698.","journal-title":"Journal of Advanced Transportation"},{"key":"e_1_3_1_42_2","unstructured":"Kevin Osanlou Christophe Guettier Tristan Cazenave and Eric Jacopin. 2022. Planning and learning: A review of methods involving path-planning for autonomous vehicles. arXiv preprint arXiv:2207.13181 (2022)."},{"issue":"1","key":"e_1_3_1_43_2","first-page":"2538220","article-title":"A review on path planning and obstacle avoidance algorithms for autonomous mobile robots","volume":"2022","author":"Rafai Anis Naema Atiyah","year":"2022","unstructured":"Anis Naema Atiyah Rafai, Noraziah Adzhar, and Nor Izzati Jaini. 2022. A review on path planning and obstacle avoidance algorithms for autonomous mobile robots. Journal of Robotics 2022, 1 (2022), 2538220.","journal-title":"Journal of Robotics"},{"issue":"1","key":"e_1_3_1_44_2","first-page":"9493","article-title":"The machine map and its conceptual model","volume":"26","author":"YOU Xiong","year":"2024","unstructured":"Xiong YOU, Fenli JIA, Jiangpeng TIAN, Jian YANG, and Ke LI. 2024. The machine map and its conceptual model. Journal of Geo-information Science 26, 1 (2024), 9493\u20139500.","journal-title":"Journal of Geo-information Science"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA55743.2025.11128157"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS60139.2025.11247564"},{"issue":"5","key":"e_1_3_1_47_2","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.11834\/jrs.20233066","article-title":"The cognitive logic and map construction model of machine maps","volume":"28","author":"Jia Fenli","year":"2023","unstructured":"Fenli Jia, Jian Yang, Xiong You, L. I. Ke, Jiangpeng Tian, and Shulei Zheng. 2023. The cognitive logic and map construction model of machine maps. National Remote Sensing Bulletin 28, 5 (2023), 1177\u20131188.","journal-title":"National Remote Sensing Bulletin"},{"issue":"14","key":"e_1_3_1_48_2","first-page":"516","article-title":"Information processing model of machine maps","volume":"49","author":"YOU Xiong","year":"2024","unstructured":"Xiong YOU, Ke LI, Jiangpeng TIAN, Jian YANG, Anzhu YU, and Fenli JIA. 2024. Information processing model of machine maps. Geomatics and Information Science of Wuhan University 49, 14 (2024), 516\u2013526.","journal-title":"Geomatics and Information Science of Wuhan University"},{"key":"e_1_3_1_49_2","first-page":"3660","volume-title":"Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development","author":"Aggarwal Swati","year":"2016","unstructured":"Swati Aggarwal, Kushagra Sharma, and Manisha Priyadarshini. 2016. Robot navigation: Review of techniques and research challenges. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development. IEEE, 3660\u20133665."},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2013.6696579"},{"key":"e_1_3_1_51_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS 2017) 30 (2017), 5998\u20136008.","journal-title":"Advances in Neural Information Processing Systems (NeurIPS 2017)"},{"key":"e_1_3_1_52_2","unstructured":"Jacob Devlin. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_1_53_2","unstructured":"Fanlong Zeng Wensheng Gan Yongheng Wang Ning Liu and Philip S. Yu. 2023. Large language models for robotics: A survey. arXiv:2311.07226. Retrieved from https:\/\/arxiv.org\/abs\/2311.07226"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10611275"},{"key":"e_1_3_1_55_2","unstructured":"Yen-Jen Wang Bike Zhang Jianyu Chen and Koushil Sreenath. 2023. Prompt a robot to walk with large language models. arXiv:2309.09969. Retrieved from https:\/\/arxiv.org\/abs\/2309.09969"},{"key":"e_1_3_1_56_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_57_2","unstructured":"Dhruv Shah B\u0142a\u017cej Osi\u0144ski Brian Ichter Sergey Levine. 2023. Lm-nav: Robotic navigation with large pre-trained models of language vision and action. In Proceedings of the Conference on Robot Learning. PMLR 492\u2013504. Retrieved from https:\/\/proceedings.mlr.press\/v205\/shah23b.html"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.52202\/068431-1723"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","unstructured":"Yinan Deng Jiahui Wang Jingyu Zhao Xinyu Tian Guangyan Chen Yi Yang and Yufeng Yue. 2024. OpenGraph: Open-vocabulary hierarchical 3D graph representation in large-scale outdoor environments. arXiv preprint arXiv:2403.09412 (2024).","DOI":"10.1109\/LRA.2024.3445607"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS55552.2023.10342363"},{"key":"e_1_3_1_61_2","unstructured":"Xiuye Gu Tsung-Yi Lin Weicheng Kuo and Yin Cui. 2022. Open-vocabulary object detection via vision and language knowledge distillation. arXiv:2104.13921v3. Retrieved from https:\/\/arxiv.org\/abs\/2104.13921v3"},{"key":"e_1_3_1_62_2","unstructured":"Sipeng Zheng Jiazheng Liu Yicheng Feng Zongqing Lu. 2023. Steve-eye: Equipping LLM-based embodied agents with visual perception in open worlds. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https:\/\/arxiv.org\/pdf\/2310.13255.pdf"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i7.28597"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610499"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA55743.2025.11128669"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA55743.2025.11127890"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/FLLM63129.2024.10852423"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2016.2624754"},{"key":"e_1_3_1_69_2","unstructured":"Edmond Tong Anthony Opipari Stanley Lewis Zhen Zeng and Odest Chadwicke Jenkins. 2024. OVAL-prompt: Open-vocabulary affordance localization for robot manipulation through LLM affordance-grounding. arXiv:2404.11000. Retrieved from https:\/\/arxiv.org\/abs\/2404.11000"},{"key":"e_1_3_1_70_2","unstructured":"Paola Ard\u00f3n \u00c9ric Pairet Katrin S. Lohan Subramanian Ramamoorthy and Ronald Petrick. 2020. Affordances in robotic tasksa survey. arXiv:2004.07400. Retrieved from https:\/\/arxiv.org\/abs\/2004.07400"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.3390\/app14198868"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10611163"},{"key":"e_1_3_1_73_2","unstructured":"Roya Firoozi Johnathan Tucker Stephen Tian Anirudha Majumdar Jiankai Sun Weiyu Liu Yuke Zhu Shuran Song Ashish Kapoor Karol Hausman et\u00a0al. 2023. Foundation models in robotics: Applications challenges and the future. arXiv:2312.07843. Retrieved from https:\/\/arxiv.org\/abs\/2312.07843"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1117\/12.942791"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-023-10131-7"},{"key":"e_1_3_1_76_2","first-page":"201","volume-title":"Proceedings of the 7th Conference on Robot Learning (Proceedings of Machine Learning Research)","author":"Wang Chen","year":"2023","unstructured":"Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, and Anima Anandkumar. 2023. MimicPlay: Long-horizon imitation learning by watching human play. In Proceedings of the 7th Conference on Robot Learning (Proceedings of Machine Learning Research). 201\u2013221."},{"key":"e_1_3_1_77_2","unstructured":"Jinjie Mai Jun Chen Bingchuan Li Guocheng Qian Mohamed Elhoseiny and Bernard Ghanem. 2023. LLM as a robotic brain: Unifying egocentric memory and control. arXiv preprint arXiv:2304.09349v1 (2023)."},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1609\/icaps.v34i1.31506"},{"key":"e_1_3_1_79_2","unstructured":"Drew McDermott Malik Ghallab Adele Howe Craig Knoblock Ashwin Ram Manuela Veloso Daniel Weld and David Wilkins. 1998. PDDL\u2014the planning domain definition language. Technical Report CVC TR-98-003\/DCS TR-1165. Yale Center for Computational Vision and Control."},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980391"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3511402"},{"key":"e_1_3_1_82_2","doi-asserted-by":"crossref","unstructured":"Yanan Guo Yanshu Ni Jing Jin Yu Jiang Dandan Li Hongyang Zhao and Yi Shen. 2025. Embodied assistant: Robot mobility operations guided by open vocabulary in open environments utilizing LLM. IEEE Internet of Things Journal (2025). Early Access.","DOI":"10.1109\/JIOT.2025.3590433"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3518105"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA55743.2025.11127706"},{"key":"e_1_3_1_85_2","doi-asserted-by":"crossref","unstructured":"Tingting Yang Ping Feng Qixin Guo Jindi Zhang Jiahong Ning Xinghan Wang and Zhongyang Mao. 2025. AutoHMA-LLM: Efficient task coordination and execution in heterogeneous multi-agent systems using hybrid large language models. IEEE Transactions on Cognitive Communications and Networking 11 2 (2025) 987\u2013998.","DOI":"10.1109\/TCCN.2025.3528892"},{"key":"e_1_3_1_86_2","doi-asserted-by":"crossref","unstructured":"Fernando Cladera Zachary Ravichandran Jason Hughes Varun Murali Carlos Nieto-Granda M. Ani Hsieh George J. Pappas Camillo J. Taylor and Vijay Kumar. 2025. Air-ground collaboration for language-specified missions in unknown environments. arXiv:2505.09108. Retrieved from https:\/\/arxiv.org\/abs\/2505.09108","DOI":"10.1109\/TFR.2025.3584019"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2025.3531410"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574711000828"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3056373"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/DEVLRN.2009.5175519"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2021.769829"},{"key":"e_1_3_1_92_2","unstructured":"Siyuan Huang Zhengkai Jiang Hao Dong Yu Qiao Peng Gao and Hongsheng Li. 2023. Instruct2Act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176v1 (2023)."},{"key":"e_1_3_1_93_2","doi-asserted-by":"crossref","unstructured":"Sai H. Vemprala Rogerio Bonatti Arthur Bucker and Ashish Kapoor. 2024. ChatGPT for robotics: Design principles and model abilities. IEEE Access 12 (2024) 55682\u201355696.","DOI":"10.1109\/ACCESS.2024.3387941"},{"key":"e_1_3_1_94_2","volume-title":"A Framework for LLM-based Lifelong Learning in Robot Manipulation","author":"Mao Jerry W.","year":"2024","unstructured":"Jerry W. Mao. 2024. A Framework for LLM-based Lifelong Learning in Robot Manipulation. Ph.D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_3_1_95_2","volume-title":"Proceedings of the Workshop on Learning Effective Abstractions for Planning","author":"Parakh Meenal","year":"2023","unstructured":"Meenal Parakh, Alisha Fong, Anthony Simeonov, Tao Chen, Abhishek Gupta, and Pulkit Agrawal. 2023. Lifelong robot learning with human assisted language planners. In Proceedings of the Workshop on Learning Effective Abstractions for Planning."},{"key":"e_1_3_1_96_2","unstructured":"Shu Yang Muhammad Asif Ali Cheng-Long Wang Lijie Hu and Di Wang. 2024. MoRAL: MoE augmented LoRA for LLMs\u2019 lifelong learning. arXiv preprint arXiv:2402.11260v1 (2024)."},{"key":"e_1_3_1_97_2","volume-title":"Proceedings of the 7th Annual Conference on Robot Learning","author":"Zitkovich Brianna","year":"2023","unstructured":"Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, et\u00a0al. 2023. RT-2: Vision-language-action models transfer web knowledge to robotic control. In Proceedings of the 7th Annual Conference on Robot Learning. Retrieved from https:\/\/openreview.net\/forum?id=XMQgwiJ7KSX"},{"key":"e_1_3_1_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610353"},{"key":"e_1_3_1_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610689"},{"key":"e_1_3_1_100_2","unstructured":"Chak Lam Shek Xiyang Wu Dinesh Manocha Pratap Tokekar and Amrit Singh Bedi. 2023. LANCAR: Leveraging language for context-aware robot locomotion in unstructured environments. arXiv preprint arXiv:2310.00481v1 (2023)."},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1080\/01691864.2020.1813623"},{"key":"e_1_3_1_102_2","unstructured":"Haokun Liu Yaonan Zhu Kenji Kato Atsushi Tsukahara Izumi Kondo Tadayoshi Aoyama and Yasuhisa Hasegawa. 2024. Enhancing the LLM-based robot manipulation through human-robot collaboration. arXiv:2406.14097v2. Retrieved from https:\/\/arxiv.org\/abs\/2406.14097v2"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2025\/1190"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01442"},{"key":"e_1_3_1_105_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19842-7_29"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3145964"},{"key":"e_1_3_1_107_2","unstructured":"Chao Yu Xinyi Yang Jiaxuan Gao Jiayu Chen Yunfei Li Jijia Liu Yunfei Xiang Ruixin Huang Huazhong Yang Yi Wu and Yu Wang. 2023. Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration. arXiv:2301.03398v2. Retrieved from https:\/\/arxiv.org\/abs\/2301.03398v2"},{"key":"e_1_3_1_108_2","unstructured":"Bangguo Yu Hamidreza Kasaei and Ming Cao. 2023. Co-NavGPT: Multi-robot cooperative visual semantic navigation using large language models. arXiv:2310.07937v3. Retrieved from https:\/\/arxiv.org\/abs\/2310.07937v3"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610855"},{"key":"e_1_3_1_110_2","unstructured":"Hongxin Zhang Weihua Du Jiaming Shan Qinhong Zhou Yilun Du Joshua B. Tenenbaum Tianmin Shu and Chuang Gan. 2023. Building cooperative embodied agents modularly with large language models. arXiv:2307.02485v2. Retrieved from https:\/\/arxiv.org\/abs\/2307.02485v2"},{"key":"e_1_3_1_111_2","unstructured":"Bin Zhang Hangyu Mao Jingqing Ruan Ying Wen Yang Li Shao Zhang Zhiwei Xu Dapeng Li Ziyue Li Rui Zhao et\u00a0al. 2023. Controlling large language model-based agents for large-scale decision-making: An actor-critic approach. arXiv:2311.13884. Retrieved from https:\/\/arxiv.org\/abs\/2311.13884"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49660.2025.10889971"},{"key":"e_1_3_1_113_2","unstructured":"Zhonghan Zhao Kewei Chen Dongxu Guo Wenhao Chai Tian Ye Yanting Zhang and Gaoang Wang. 2024. Hierarchical auto-organizing system for open-ended multi-agent navigation. arXiv:2403.08282. Retrieved from https:\/\/arxiv.org\/abs\/2403.08282"},{"key":"e_1_3_1_114_2","unstructured":"Saaket Agashe Yue Fan and Xin Eric Wang. 2023. Evaluating multi-agent coordination abilities in large language models. arXiv:2310.03903. Retrieved from https:\/\/arxiv.org\/abs\/2310.03903"},{"key":"e_1_3_1_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610676"},{"key":"e_1_3_1_116_2","unstructured":"Jijia Liu Chao Yu Jiaxuan Gao Yuqing Xie Qingmin Liao Yi Wu and Yu Wang. 2023. LLM-powered hierarchical language agent for real-time human-ai coordination. arXiv:2312.15224. Retrieved from https:\/\/arxiv.org\/abs\/2312.15224"},{"key":"e_1_3_1_117_2","doi-asserted-by":"crossref","unstructured":"Lance Ying Kunal Jha Shivam Aarya Joshua B. Tenenbaum Antonio Torralba and Tianmin Shu. 2024. GOMA: Proactive embodied cooperative communication via goal-oriented mental alignment. (2024). arXiv:cs.HC\/2403.11075 Version 2.","DOI":"10.1109\/IROS58592.2024.10802144"},{"key":"e_1_3_1_118_2","unstructured":"Jun Wang Guocheng He and Yiannis Kantaros. 2024. Safe task planning for language-instructed multi-robot systems using conformal prediction. arXiv:2402.15368. Retrieved from https:\/\/arxiv.org\/abs\/2402.15368"},{"key":"e_1_3_1_119_2","first-page":"42829","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhou Kaiwen","year":"2023","unstructured":"Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, and Xin Eric Wang. 2023. Esc: Exploration with soft commonsense constraints for zero-shot object navigation. In Proceedings of the International Conference on Machine Learning. PMLR, 42829\u201342842."},{"key":"e_1_3_1_120_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29858"},{"key":"e_1_3_1_121_2","unstructured":"Weizheng Wang Le Mao Ruiqi Wang and Byung-Cheol Min. 2024. SRLM: Human-in-loop interactive social robot navigation with large language model and deep reinforcement learning. arXiv:2403.15648. Retrieved from https:\/\/arxiv.org\/abs\/2403.15648"},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3011912"},{"key":"e_1_3_1_123_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2022.104132"},{"key":"e_1_3_1_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/AIRC64931.2025.11077500"},{"key":"e_1_3_1_125_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA55743.2025.11128105"},{"key":"e_1_3_1_126_2","doi-asserted-by":"crossref","unstructured":"Daniel Honerkamp Martin Buchner Fabien Despinoy Tim Welschehold and Abhinav Valada. 2024. Language-grounded dynamic scene graphs for interactive object search with mobile manipulation. IEEE Robotics and Automation Letters 9 (2024) 8298\u20138305.","DOI":"10.1109\/LRA.2024.3441495"},{"key":"e_1_3_1_127_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2024.3485907"},{"key":"e_1_3_1_128_2","unstructured":"Dhruv Shah Michael Robert Equi B\u0142a\u017cej Osi\u0144ski Fei Xia Brian Ichter and Sergey Levine. 2023. Navigation with large language models: Semantic guesswork as a heuristic for planning. In Proceedings of The 7th Conference on Robot Learning (CoRL 2023). PMLR 2683\u20132699."},{"key":"e_1_3_1_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/ECMR65884.2025.11163198"},{"key":"e_1_3_1_130_2","doi-asserted-by":"publisher","DOI":"10.1109\/ROBIO64047.2024.10907658"},{"key":"e_1_3_1_131_2","doi-asserted-by":"crossref","unstructured":"Advaith Balaji Saket Pradhan and Dmitry Berenson. 2025. Language-guided object search in agricultural environments. In 2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE.","DOI":"10.1109\/ICRA55743.2025.11128175"},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00387"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01000"},{"key":"e_1_3_1_134_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i03.5627"},{"key":"e_1_3_1_135_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01250"},{"key":"e_1_3_1_136_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01281"},{"key":"e_1_3_1_137_2","doi-asserted-by":"crossref","unstructured":"Khanh Nguyen and Hal Daum\u00e9 III. 2019. Help Anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Association for Computational Linguistics 684\u2013695.","DOI":"10.18653\/v1\/D19-1063"},{"key":"e_1_3_1_138_2","unstructured":"An Yan Xin Eric Wang Jiangtao Feng Lei Li and William Yang Wang. 2019. Cross-lingual vision-language navigation. arXiv:1910.11301. Retrieved from https:\/\/arxiv.org\/abs\/1910.11301"},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01075"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3193254"},{"key":"e_1_3_1_141_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00430"},{"key":"e_1_3_1_142_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i2.20097"},{"key":"e_1_3_1_143_2","unstructured":"Yi Wu Yuxin Wu Georgia Gkioxari and Yuandong Tian. 2018. Building generalizable agents with a realistic and rich 3D environment. (2018). arXiv:cs.AI\/1801.02209 Version 1."},{"key":"e_1_3_1_144_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Das Abhishek","year":"2018","unstructured":"Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra. 2018. Embodied question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u201310."},{"key":"e_1_3_1_145_2","doi-asserted-by":"crossref","unstructured":"Angel Chang Angela Dai Thomas Funkhouser Maciej Halber Matthias Niesner Manolis Savva Shuran Song Andy Zeng and Yinda Zhang. 2017. Matterport3d: Learning from RGB-D data in indoor environments. (2017). arXiv:cs.CV\/1709.06158 Version 1.","DOI":"10.1109\/3DV.2017.00081"},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01282"},{"key":"e_1_3_1_147_2","doi-asserted-by":"crossref","unstructured":"Alexander Ku Peter Anderson Roma Patel Eugene Ie and Jason Baldridge. 2020. Room-across-room: multilingual vision-and-language navigation with dense spatiotemporal grounding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics 4392\u20134403.","DOI":"10.18653\/v1\/2020.emnlp-main.356"},{"key":"e_1_3_1_148_2","unstructured":"Eric Kolve Roozbeh Mottaghi Winson Han Eli VanderBilt Luca Weihs Alvaro Herrasti Matt Deitke Kiana Ehsani Daniel Gordon Yuke Zhu et\u00a0al. 2017. Ai2-thor: An interactive 3d environment for visual ai. arXiv:1712.05474. Retrieved from https:\/\/arxiv.org\/abs\/1712.05474"},{"key":"e_1_3_1_149_2","first-page":"1384","volume-title":"Proceedings of the Conference on Robot Learning","author":"Banerjee Shurjo","year":"2021","unstructured":"Shurjo Banerjee, Jesse Thomason, and Jason Corso. 2021. The robotslang benchmark: Dialog-guided robot localization and navigation. In Proceedings of the Conference on Robot Learning. PMLR, 1384\u20131393."},{"key":"e_1_3_1_150_2","first-page":"394","volume-title":"Proceedings of the Conference on Robot Learning","author":"Thomason Jesse","year":"2020","unstructured":"Jesse Thomason, Michael Murray, Maya Cakmak, and Luke Zettlemoyer. 2020. Vision-and-dialog navigation. In Proceedings of the Conference on Robot Learning. PMLR, 394\u2013406."},{"key":"e_1_3_1_151_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00945"},{"key":"e_1_3_1_152_2","doi-asserted-by":"crossref","unstructured":"Matt Deitke Eli VanderBilt Alvaro Herrasti Luca Weihs Jordi Salvador Kiana Ehsani Winson Han Eric Kolve Ali Farhadi Aniruddha Kembhavi and Roozbeh Mottaghi. 2022. ProcTHOR: large-scale embodied AI using procedural generation. In Advances in Neural Information Processing Systems (NeurIPS). Curran Associates Inc.","DOI":"10.52202\/068431-0433"},{"key":"e_1_3_1_153_2","unstructured":"Santhosh Kumar Ramakrishnan Aaron Gokaslan Erik Wijmans Oleksandr Maksymets Alexander Clegg John M. Turner Eric Undersander Wojciech Galuba Andrew Westbury Angel X. Chang Manolis Savva Yili Zhao and Dhruv Batra. 2021. Habitat-Matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS Datasets and Benchmarks 2021). Curran Associates Inc."},{"key":"e_1_3_1_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00477"},{"key":"e_1_3_1_155_2","unstructured":"Abhishek Padalkar Acorn Pooley Ajinkya Jain Alex Bewley Alex Herzog Alex Irpan Alexander Khazatsky Anant Rai Anikait Singh Anthony Brohan et\u00a0al. 2024. Open X-embodiment: Robotic learning datasets and RT-X models. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE 6892\u20136903."},{"key":"e_1_3_1_156_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561806"},{"key":"e_1_3_1_157_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58604-1_7"},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01411"},{"key":"e_1_3_1_159_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-020-01374-3"},{"key":"e_1_3_1_160_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6849"},{"key":"e_1_3_1_161_2","doi-asserted-by":"crossref","unstructured":"Harsh Mehta Yoav Artzi Jason Baldridge Eugene Ie and Piotr Mirowski. 2020. Retouchdown: Adding touchdown to streetlearn as a shareable resource for language grounding tasks in street view. (2020). arXiv:cs.CV\/2001.03671 Version 1.","DOI":"10.18653\/v1\/2020.splu-1.7"},{"key":"e_1_3_1_162_2","unstructured":"Piotr Mirowski Andras Banki-Horvath Keith Anderson Denis Teplyashin Karl Moritz Hermann Mateusz Malinowski Matthew Koichi Grimes Karen Simonyan Koray Kavukcuoglu Andrew Zisserman and Raia Hadsell. 2019. The streetlearn environment and dataset. (2019). arXiv:cs.CV\/1903.01292 Version 1."},{"key":"e_1_3_1_163_2","doi-asserted-by":"crossref","unstructured":"Yuchen Wu Pengcheng Zhang Meiying Gu Jin Zheng and Xiao Bai. 2024. Embodied navigation with multi-modal information: A survey from tasks to methodology. Information Fusion 112 C (2024) 102532.","DOI":"10.1016\/j.inffus.2024.102532"},{"key":"e_1_3_1_164_2","unstructured":"Peter Anderson Angel Chang Devendra Singh Chaplot Alexey Dosovitskiy Saurabh Gupta Vladlen Koltun Jana Kosecka Jitendra Malik Roozbeh Mottaghi Manolis Savva and Amir R. Zamir. 2018. On evaluation of embodied navigation agents. (2018). arXiv:cs.AI\/1807.06757 Version 1."},{"key":"e_1_3_1_165_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2021.103139"},{"key":"e_1_3_1_166_2","volume-title":"Heuristic Search: Theory and Applications","author":"Edelkamp Stefan","year":"2011","unstructured":"Stefan Edelkamp and Stefan Schr\u00f6dl. 2011. Heuristic Search: Theory and Applications. Elsevier."},{"key":"e_1_3_1_167_2","volume-title":"Proceedings of the I Can\u2019t Believe It\u2019s Not Better Workshop: Failure Modes in the Age of Foundation Models","author":"Sharma Manasi","year":"2023","unstructured":"Manasi Sharma. 2023. Exploring and improving the spatial reasoning abilities of large language models. In Proceedings of the I Can\u2019t Believe It\u2019s Not Better Workshop: Failure Modes in the Age of Foundation Models."},{"key":"e_1_3_1_168_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10610434"},{"key":"e_1_3_1_169_2","first-page":"5834","article-title":"History aware multimodal transformer for vision-and-language navigation","volume":"34","author":"Chen Shizhe","year":"2021","unstructured":"Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, and Ivan Laptev. 2021. History aware multimodal transformer for vision-and-language navigation. Advances in Neural Information Processing Systems 34 (2021), 5834\u20135847.","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3802539","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T09:33:42Z","timestamp":1777282422000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3802539"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,25]]},"references-count":168,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2026,8,30]]}},"alternative-id":["10.1145\/3802539"],"URL":"https:\/\/doi.org\/10.1145\/3802539","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,25]]},"assertion":[{"value":"2024-09-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-30","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-04-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}