{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:18:21Z","timestamp":1757618301187,"version":"3.44.0"},"reference-count":15,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T00:00:00Z","timestamp":1749772800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T00:00:00Z","timestamp":1749772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Life Robotics"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Large Language Models (LLMs) are a type of machine learning model trained on vast amounts of natural language that have demonstrated novel capabilities in tasks such as text prediction and generation. These tasks allow LLMs to be remarkably suited for understanding the semantics of natural language, which in turn enables applications such as planning real world tasks, writing code for computers, and translating between human languages. Even though LLMs could provide more flexibility in interpreting user requests and have shown to possess some commonsense knowledge, their capabilities for translating natural language instructions into code to control robot actions is only starting to be explored. More specifically, in this paper we are interested in the control of robots tasked with preparing cocktails. Within this context, it is assumed that the LLM has access to a repository of well-formatted recipes. This means that each recipe is written according to the following layout: a list of ingredients, then a subsequent description of how to prepare and mix the various items. Moreover, a set of low-level modules responsible for robot manipulation and vision-related tasks is also provided to the LLM in the shape of an application programming interface (API). Consequently, the main focus of the LLM is on generating a sequence of calls to the API, along with the right parameters, to produce the cocktail requested by users in natural language. Here, we show that it is feasible for LLMs to perform this type of translation on a small number of custom modules, and that certain techniques provide a measurable benefit to the accuracy and consistency of this task without fine-tuning. We found in particular that the use of an ensemble-voting strategy, where multiple trials are repeated and the most common answer is selected, increases accuracy to a certain extent. In addition, there is moderate support for the use of natural language parsing to adjust the prompt of the LLM prior to translation. Lastly, building on previous knowledge we also provide a set of guidelines to help design prompts to improve the accuracy of the resulting sequence of actions. In general, these results suggest that while LLMs can be used as translators of robot instructions, they are best applied in conjunction with these other strategies. The impact of these findings could influence future robotics development, as it provides directions for implementing LLMs more effectively and broadening the accessibility of robotic control to users without an extensive software background.<\/jats:p>","DOI":"10.1007\/s10015-025-01031-3","type":"journal-article","created":{"date-parts":[[2025,6,12]],"date-time":"2025-06-12T23:52:57Z","timestamp":1749772377000},"page":"407-416","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Robots reading recipes: large language models as translators between humans and machines"],"prefix":"10.1007","volume":"30","author":[{"given":"Oliver","family":"Wang","sequence":"first","affiliation":[]},{"given":"Grant","family":"Cheng","sequence":"additional","affiliation":[]},{"given":"Luc","family":"Caspar","sequence":"additional","affiliation":[]},{"given":"Akira","family":"Yokota","sequence":"additional","affiliation":[]},{"given":"Mahdi","family":"Khosravy","sequence":"additional","affiliation":[]},{"given":"Olaf","family":"Witkowski","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,13]]},"reference":[{"key":"1031_CR1","unstructured":"Ahn M, Brohan A, Brown N, Chebotar Y, Cortes O, David B, Finn C, Fu C, Gopalakrishnan K, Hausman K, Herzog A, Ho D, Hsu J, Ibarz J, Ichter B, Irpan A, Jang E, Ruano RJ, Jeffrey K, Jesmonth S, Joshi NJ, Julian R, Kalashnikov D, Kuang Y, Lee K-H, Levine S, Lu Y, Luu L, Parada C, Pastor P, Quiambao J, Rao K, Rettinghouse J, Reyes D, Sermanet P, Sievers N, Tan C, Toshev A, Vanhoucke V, Xia F, Xiao T, Xu P, Xu S, Yan M, Zeng A (2022) Do as i can, not as i say: grounding language in robotic affordances. arXiv:2204.01691"},{"key":"1031_CR2","unstructured":"Almazrouei E, Alobeidli H, Alshamsi A, Cappelli A, Cojocaru R, Debbah M, Goffinet \u00c9, Hesslow D, Launay J, Malartic Q, Mazzotta D, Noune B, Pannier B, Penedo G (2023) The falcon series of open language models. arXiv:2311.16867"},{"key":"1031_CR3","volume-title":"Natural language processing with Python: analyzing text with the natural language toolkit","author":"S Bird","year":"2009","unstructured":"Bird S, KE., Loper E, (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O\u2019Reilly Media, Inc, Sebastopol"},{"key":"1031_CR4","unstructured":"Cheng G, Zhang C, Cai W, Zhao L, Sun C, Bian J (2024) Empowering large language models on robotic manipulation with affordance prompting. arXiv:2404.11027"},{"key":"1031_CR5","unstructured":"Beeching E, Fourrier C, Habib N, Han S, Lambert N, Rajani N, Sanseviero O, Tunstall L, Wolf T (2023) Open LLM leaderboard (2023\u20132024). Hugging Face. Retrieved July 23, 2023 from https:\/\/huggingface.co\/spaces\/open-llm-leaderboard-old\/open_llm_leaderboard"},{"issue":"5","key":"1031_CR6","doi-asserted-by":"publisher","first-page":"1091","DOI":"10.1007\/s11370-024-00550-5","volume":"17","author":"Y Kim","year":"2024","unstructured":"Kim Y, Kim D, Choi J, Park J, Oh N, Park D (2024) A survey on integration of large language models with intelligent robots. Intel Serv Robot 17(5):1091\u20131107","journal-title":"Intel Serv Robot"},{"key":"1031_CR7","unstructured":"Kira Z (2022) Awesome-LLM-robotics. https:\/\/github.com\/GT-RIPL\/Awesome-LLM-Robotics"},{"key":"1031_CR8","doi-asserted-by":"publisher","first-page":"66467","DOI":"10.1109\/ACCESS.2022.3182399","volume":"10","author":"D M\u00fcller","year":"2022","unstructured":"M\u00fcller D, Soto-Rey I, Kramer F (2022) An analysis on ensemble learning optimized medical image classification with deep convolutional neural networks. Ieee Access. 10:66467\u201366480","journal-title":"Ieee Access."},{"issue":"2","key":"1031_CR9","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1007\/s10846-017-0468-y","volume":"86","author":"A Polydoros","year":"2017","unstructured":"Polydoros A, Nalpantidis L (2017) Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst. Robot Syst. 86(2):153\u2013173","journal-title":"J Intell Robot Syst. Robot Syst."},{"key":"1031_CR10","doi-asserted-by":"crossref","unstructured":"Vemprala S, Bonatti R, Bucker A, Kapoor A (2023) ChatGPT for robotics: design principles and model abilities. arXiv:2306.17582","DOI":"10.1109\/ACCESS.2024.3387941"},{"key":"1031_CR11","unstructured":"Valmeekam K, Marquez M, Sreedharan S, Kambhampati S (2023) On the planning abilities of large language models: a critical investigation. arXiv:2305.15771"},{"key":"1031_CR12","doi-asserted-by":"crossref","unstructured":"Wang J, Wu Z, Li Y, Jiang H, Shu P, Shi E, Hu H, Ma C, Liu Y, Wang X, Yao Y, Liu X, Zhao H, Liu Z, Dai H, Zhao L, Ge B, Li X, Liu T, Zhang S (2024). Large language models for robotics: Opportunities, challenges, and perspectives.","DOI":"10.1016\/j.jai.2024.12.003"},{"issue":"3","key":"1031_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3386252","volume":"53","author":"Y Wang","year":"2020","unstructured":"Wang Y, Yao Q, Kwok JT, Ni LM (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53(3):1\u201334","journal-title":"ACM Comput. Surv."},{"key":"1031_CR14","unstructured":"Zeng A, Attarian M, Ichter B, Choromanski K, Wong A, Welker S, Tombari F, Purohit A, Ryoo M, Sindhwani V, Lee J, Vanhoucke V, Florence P (2022) Socratic models: composing zero-shot multimodal reasoning with language. arXiv:2204.00598"},{"key":"1031_CR15","unstructured":"Zhao P, Zhang H, Yu Q, Wang Z, Geng Y, Fu F, Yang L, Zhang W, Jiang J, Cui B (2024) Retrieval-augmented generation for AI-generated content: a survey. arXiv:2402.19473"}],"container-title":["Artificial Life and Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10015-025-01031-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10015-025-01031-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10015-025-01031-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T19:20:00Z","timestamp":1757186400000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10015-025-01031-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,13]]},"references-count":15,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["1031"],"URL":"https:\/\/doi.org\/10.1007\/s10015-025-01031-3","relation":{},"ISSN":["1433-5298","1614-7456"],"issn-type":[{"type":"print","value":"1433-5298"},{"type":"electronic","value":"1614-7456"}],"subject":[],"published":{"date-parts":[[2025,6,13]]},"assertion":[{"value":"10 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 June 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 July 2025","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The incorrect article note has been updated with the correct text. The original article has been corrected.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}