{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T12:57:53Z","timestamp":1776085073121,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,3,11]],"date-time":"2024-03-11T00:00:00Z","timestamp":1710115200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,3,11]]},"DOI":"10.1145\/3610978.3640723","type":"proceedings-article","created":{"date-parts":[[2024,3,10]],"date-time":"2024-03-10T22:55:43Z","timestamp":1710111343000},"page":"808-812","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["The Conversation is the Command: Interacting with Real-World Autonomous Robots Through Natural Language"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1767-2140","authenticated-orcid":false,"given":"Linus","family":"Nwankwo","sequence":"first","affiliation":[{"name":"Chair of Cyber-Physical Systems, Montanuniversit\u00e4t, Leoben, Styria, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1221-8253","authenticated-orcid":false,"given":"Elmar","family":"Rueckert","sequence":"additional","affiliation":[{"name":"Chair of Cyber-Physical Systems, Montanuniversit\u00e4t, Leoben, Styria, Austria"}]}],"member":"320","published-online":{"date-parts":[[2024,3,11]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Michael Ahn Anthony Brohan Noah Brown Yevgen Chebotar Omar Cortes Byron David Chelsea Finn Chuyuan Fu Keerthana Gopalakrishnan Karol Hausman Alex Herzog Daniel Ho Jasmine Hsu Julian Ibarz Brian Ichter Alex Irpan Eric Jang Rosario Jauregui Ruano Kyle Jeffrey Sally Jesmonth Nikhil J Joshi Ryan Julian Dmitry Kalashnikov Yuheng Kuang Kuang-Huei Lee Sergey Levine Yao Lu Linda Luu Carolina Parada Peter Pastor Jornell Quiambao Kanishka Rao Jarek Rettinghouse Diego Reyes Pierre Sermanet Nicolas Sievers Clayton Tan Alexander Toshev Vincent Vanhoucke Fei Xia Ted Xiao Peng Xu Sichun Xu Mengyuan Yan and Andy Zeng. 2022. Do As I Can Not As I Say: Grounding Language in Robotic Affordances. arxiv: 2204.01691 [cs.RO]"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2157689.2157815"},{"key":"e_1_3_2_2_3_1","volume-title":"Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach.","author":"Black Sid","year":"2022","unstructured":"Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arxiv: 2204.06745 [cs.CL]"},{"key":"e_1_3_2_2_4_1","unstructured":"Anthony Brohan Noah Brown Justice Carbajal Yevgen Chebotar Xi Chen Krzysztof Choromanski Tianli Ding Danny Driess Avinava Dubey Chelsea Finn Pete Florence Chuyuan Fu Montse Gonzalez Arenas Keerthana Gopalakrishnan Kehang Han Karol Hausman Alexander Herzog Jasmine Hsu Brian Ichter Alex Irpan Nikhil Joshi Ryan Julian Dmitry Kalashnikov Yuheng Kuang Isabel Leal Lisa Lee Tsang-Wei Edward Lee Sergey Levine Yao Lu Henryk Michalewski Igor Mordatch Karl Pertsch Kanishka Rao Krista Reymann Michael Ryoo Grecia Salazar Pannag Sanketi Pierre Sermanet Jaspiar Singh Anikait Singh Radu Soricut Huong Tran Vincent Vanhoucke Quan Vuong Ayzaan Wahid Stefan Welker Paul Wohlhart Jialin Wu Fei Xia Ted Xiao Peng Xu Sichun Xu Tianhe Yu and Brianna Zitkovich. 2023 a. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. arxiv: 2307.15818 [cs.RO]"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Anthony Brohan Noah Brown Justice Carbajal Yevgen Chebotar Joseph Dabis Chelsea Finn Keerthana Gopalakrishnan Karol Hausman Alex Herzog Jasmine Hsu Julian Ibarz Brian Ichter Alex Irpan Tomas Jackson Sally Jesmonth Nikhil J Joshi Ryan Julian Dmitry Kalashnikov Yuheng Kuang Isabel Leal Kuang-Huei Lee Sergey Levine Yao Lu Utsav Malla Deeksha Manjunath Igor Mordatch Ofir Nachum Carolina Parada Jodilyn Peralta Emily Perez Karl Pertsch Jornell Quiambao Kanishka Rao Michael Ryoo Grecia Salazar Pannag Sanketi Kevin Sayed Jaspiar Singh Sumedh Sontakke Austin Stone Clayton Tan Huong Tran Vincent Vanhoucke Steve Vega Quan Vuong Fei Xia Ted Xiao Peng Xu Sichun Xu Tianhe Yu and Brianna Zitkovich. 2023 b. RT-1: Robotics Transformer for Real-World Control at Scale. arxiv: 2212.06817 [cs.RO]","DOI":"10.15607\/RSS.2023.XIX.025"},{"key":"e_1_3_2_2_6_1","volume-title":"Language Models are Few-Shot Learners. arxiv","author":"Brown Tom B.","year":"2005","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv: 2005.14165 [cs.CL]"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614905"},{"key":"e_1_3_2_2_8_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv: 1810.04805 [cs.CL]"},{"key":"e_1_3_2_2_9_1","unstructured":"Miguel \u00c1. Gonz\u00e1lez-Santamarta Francisco J. Rodr\u00edguez-Lera \u00c1ngel Manuel Guerrero-Higueras and Vicente Matell\u00e1n-Olivera. 2023. Integration of Large Language Models within Cognitive Architectures for Autonomous Robots. arxiv: 2309.14945 [cs.RO]"},{"key":"e_1_3_2_2_10_1","unstructured":"Wenlong Huang Pieter Abbeel Deepak Pathak and Igor Mordatch. 2022. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. arxiv: 2201.07207 [cs.LG]"},{"key":"e_1_3_2_2_11_1","unstructured":"Glenn Jocher Ayush Chaurasia and Jing Qiu. 2023. Ultralytics YOLOv9. https:\/\/github.com\/ultralytics\/ultralytics"},{"key":"e_1_3_2_2_12_1","volume-title":"Norman Di Palo, and Edward Johns","author":"Kwon Teyun","year":"2023","unstructured":"Teyun Kwon, Norman Di Palo, and Edward Johns. 2023. Language Models as Zero-Shot Trajectory Generators. arxiv: 2310.11604 [cs.RO]"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3571599"},{"key":"e_1_3_2_2_14_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. arxiv","author":"Liu Yinhan","year":"1907","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arxiv: 1907.11692 [cs.CL]"},{"key":"e_1_3_2_2_15_1","volume-title":"Interactive Language: Talking to Robots in Real Time. arxiv: 2210.06407 [cs.RO]","author":"Lynch Corey","year":"2022","unstructured":"Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, and Pete Florence. 2022. Interactive Language: Talking to Robots in Real Time. arxiv: 2210.06407 [cs.RO]"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2022.870477"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ohx.2023.e00426"},{"key":"e_1_3_2_2_18_1","volume-title":"Advances in Service and Industrial Robotics, Tadej Petrivc, Alevs Ude, and Leon vZ lajpah (Eds.)","author":"Nwankwo Linus","unstructured":"Linus Nwankwo and Elmar Rueckert. 2023. Understanding Why SLAM Algorithms Fail in\u00a0Modern Indoor Environments. In Advances in Service and Industrial Robotics, Tadej Petrivc, Alevs Ude, and Leon vZ lajpah (Eds.). Springer Nature Switzerland, Cham, 186--194."},{"key":"e_1_3_2_2_19_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]"},{"key":"e_1_3_2_2_20_1","volume-title":"ICRA Workshop on Open Source Software","volume":"3","author":"Quigley Morgan","year":"2009","unstructured":"Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Ng. 2009. ROS: an open-source Robot Operating System. ICRA Workshop on Open Source Software, Vol. 3."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2018.03.014"},{"key":"e_1_3_2_2_22_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv: 2103.00020 [cs.CV]"},{"key":"e_1_3_2_2_23_1","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. https:\/\/api.semanticscholar.org\/CorpusID:160025533"},{"key":"e_1_3_2_2_24_1","unstructured":"Aditya Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. arxiv: 2102.12092 [cs.CV]"},{"key":"e_1_3_2_2_25_1","unstructured":"Juan Rocamonde Victoriano Montesinos Elvis Nava Ethan Perez and David Lindner. 2023. Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning. arxiv: 2310.12921 [cs.LG]"},{"key":"e_1_3_2_2_26_1","volume-title":"a distilled version of BERT: smaller, faster, cheaper and lighter. arxiv","author":"Sanh Victor","year":"1910","unstructured":"Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arxiv: 1910.01108 [cs.CL]"},{"key":"e_1_3_2_2_27_1","unstructured":"Christoph Schuhmann Richard Vencu Romain Beaumont Robert Kaczmarczyk Clayton Mullis Aarush Katta Theo Coombes Jenia Jitsev and Aran Komatsuzaki. 2021. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. arxiv: 2111.02114 [cs.CV]"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3523760.3523933"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2023.1084000"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3568162.3578810"},{"key":"e_1_3_2_2_31_1","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv: 2302.13971 [cs.CL]"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3571718"},{"key":"e_1_3_2_2_33_1","unstructured":"Yaqi Xie Chen Yu Tongyao Zhu Jinbin Bai Ze Gong and Harold Soh. 2023. Translating Natural Language to Planning Goals with Large-Language Models. arxiv: 2302.05128 [cs.CL]"},{"key":"e_1_3_2_2_34_1","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Zhou Kaiwen","year":"2023","unstructured":"Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, and Xin Eric Wang. 2023 b. ESC: Exploration with Soft Commonsense Constraints for Zero-Shot Object Navigation. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML'23). JMLR.org, Article 1806, 14 pages."},{"key":"e_1_3_2_2_35_1","unstructured":"Wangchunshu Zhou Yuchen Eleanor Jiang Long Li Jialong Wu Tiannan Wang Shi Qiu Jintian Zhang Jing Chen Ruipu Wu Shuai Wang Shiding Zhu Jiyu Chen Wentao Zhang Ningyu Zhang Huajun Chen Peng Cui and Mrinmaya Sachan. 2023 a. Agents: An Open-source Framework for Autonomous Language Agents. arxiv: 2309.07870 [cs.CL]"}],"event":{"name":"HRI '24: ACM\/IEEE International Conference on Human-Robot Interaction","location":"Boulder CO USA","acronym":"HRI '24","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Companion of the 2024 ACM\/IEEE International Conference on Human-Robot Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610978.3640723","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3610978.3640723","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T01:15:04Z","timestamp":1755825304000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610978.3640723"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,11]]},"references-count":35,"alternative-id":["10.1145\/3610978.3640723","10.1145\/3610978"],"URL":"https:\/\/doi.org\/10.1145\/3610978.3640723","relation":{},"subject":[],"published":{"date-parts":[[2024,3,11]]},"assertion":[{"value":"2024-03-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}