{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T19:38:58Z","timestamp":1771702738196,"version":"3.50.1"},"reference-count":47,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>This paper introduces Alter3, a humanoid robot that demonstrates spontaneous motion generation through the integration of GPT-4, a cutting-edge Large Language Model (LLM). This integration overcomes the challenge of applying LLMs to direct robot control, which typically struggles with the hardware-specific nuances of robotic operation. By translating linguistic descriptions of human actions into robotic movements via programming, Alter3 can autonomously perform a diverse range of actions, such as adopting a \u201cselfie\u201d pose or simulating a \u201cghost.\u201d This approach not only shows Alter3\u2019s few-shot learning capabilities but also its adaptability to verbal feedback for pose adjustments without manual fine-tuning. This research advances the field of humanoid robotics by bridging linguistic concepts with physical embodiment and opens new avenues for exploring spontaneity in humanoid robots.<\/jats:p>","DOI":"10.3389\/frobt.2025.1581110","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T05:24:42Z","timestamp":1748323482000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["From text to motion: grounding GPT-4 in a humanoid robot \u201cAlter3\u201d"],"prefix":"10.3389","volume":"12","author":[{"given":"Takahide","family":"Yoshida","sequence":"first","affiliation":[]},{"given":"Atsushi","family":"Masumori","sequence":"additional","affiliation":[]},{"given":"Takashi","family":"Ikegami","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,5,27]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.01691","article-title":"Do as I can, not as I say: grounding language in robotic affordances","author":"Ahn","year":"2022","journal-title":"arXiv cs.RO"},{"key":"B3","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1037\/h0030834","article-title":"Integration theory and attitude change","volume":"78","author":"Anderson","year":"1963","journal-title":"Psychol. Rev."},{"key":"B4","volume-title":"A cognitive theory of consciousness","author":"Bernard","year":"1988"},{"key":"B5","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2307.15818","article-title":"RT-2: vision-language-action models transfer web knowledge to robotic control","author":"Brohan","year":"","journal-title":"arXiv"},{"key":"B6","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2023.XIX.025","article-title":"Rt-1: robotics transformer for real-world control at scale","author":"Brohan","year":""},{"key":"B7","article-title":"Language models are few-shot learners","author":"Brown","year":"2020"},{"key":"B8","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2024.XX.107","article-title":"Expressive whole-body control for humanoid robots","author":"Cheng","year":"2024"},{"key":"B9","doi-asserted-by":"crossref","DOI":"10.1093\/acprof:oso\/9780190217013.001.0001","volume-title":"Surfing uncertainty: prediction, action, and the embodied mind","author":"Clark","year":"2016"},{"key":"B10","article-title":"Open x-embodiment: robotic learning datasets and rt-x models","author":"Collaboration","year":"2024"},{"key":"B11","doi-asserted-by":"publisher","first-page":"2086","DOI":"10.1109\/iros55552.2023.10342169","article-title":"Task and motion planning with large language models for object rearrangement","author":"Ding","year":"2023","journal-title":"arXiv"},{"key":"B12","first-page":"19","article-title":"Organismically-inspired robotics: homeostatic adaptation and teleology beyond the closed sensorimotor loop","volume-title":"Dynamic systems approach for embodiment and sociality: from ecological psychology to robotics","author":"Di Paolo","year":"2003"},{"key":"B13","first-page":"490","volume-title":"A new design principle for an autonomous robot","author":"Doi","year":"2017"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.03378","article-title":"PaLM-E: an embodied multimodal language model","author":"Driess","year":"2023","journal-title":"arXiv"},{"key":"B15","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1038\/nrn2787","article-title":"The free-energy principle: a unified brain theory?","volume":"11","author":"Friston","year":"2010","journal-title":"Nat. Rev. Neurosci."},{"key":"B16","first-page":"4105","article-title":"Comic: complementary task learning and amp; mimicry for reusable skills","volume-title":"Proceedings of the 37th international conference on machine learning","author":"Hasenclever","year":"2020"},{"key":"B17","doi-asserted-by":"publisher","first-page":"8944","DOI":"10.1109\/iros58592.2024.10801984","article-title":"Learning human-to-humanoid real-time whole-body teleoperation","author":"He","year":"2024","journal-title":"arXiv"},{"key":"B18","doi-asserted-by":"publisher","first-page":"110663","DOI":"10.1016\/j.isci.2024.110663","article-title":"Impact of social context on human facial and gestural emotion expressions","volume":"27","author":"Heesen","year":"2024","journal-title":"iScience"},{"key":"B19","doi-asserted-by":"publisher","first-page":"1321","DOI":"10.1109\/robot.1998.677288","article-title":"The development of honda humanoid robot","volume":"2","author":"Hirai","year":"1998","journal-title":"Proc. IEEE"},{"key":"B20","doi-asserted-by":"publisher","first-page":"10608","DOI":"10.1109\/icra48891.2023.10160969","article-title":"Visual language maps for robot navigation","author":"Huang","year":"","journal-title":"arXiv"},{"key":"B21","first-page":"59636","article-title":"Grounded decoding: guiding text generation with grounded models for embodied agents","volume-title":"Advances in neural information processing systems","author":"Huang","year":""},{"key":"B22","article-title":"Can mutual imitation generate open-ended evolution?","volume-title":"In the proceedings of Artificial Life 2021 workshop on OEE","author":"Ikegami","year":"2021"},{"key":"B23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.5555\/1248547.1248548","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Janez","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"B24","first-page":"7","article-title":"Cybernetic human HRP-4C","author":"Kaneko","year":"2009"},{"key":"B25","doi-asserted-by":"publisher","first-page":"15917","DOI":"10.1038\/s41598-024-65604-1","article-title":"Gromov\u2013wasserstein unsupervised alignment reveals structural correspondences between the color similarity structures of humans and large language models","volume":"14","author":"Kawakita","year":"2024","journal-title":"Sci. Rep."},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2209.07753","article-title":"Code as policies: language model programs for embodied control","author":"Liang","year":"2023","journal-title":"arXiv"},{"key":"B27","article-title":"Universal humanoid motion representations for physics-based control","author":"Luo","year":"2024"},{"key":"B28","doi-asserted-by":"publisher","first-page":"21445","DOI":"10.1038\/s41598-024-72071-1","article-title":"Large language models predict human sensory judgments across six modalities","volume":"14","author":"Marjieh","year":"2024","journal-title":"Sci. Rep."},{"key":"B29","doi-asserted-by":"publisher","first-page":"7","DOI":"10.3389\/frobt.2020.532375","article-title":"Personogenesis through imitating human behavior in a humanoid robot alter3","author":"Masumori","year":"2021","journal-title":"Front. Robot."},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.08774","article-title":"GPT-4 technical report","year":"2023","journal-title":"arXiv"},{"key":"B31","first-page":"11072","article-title":"Realistic and interactive robot gaze","author":"Pan","year":"2020"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3197517.3201311","article-title":"Deepmimic: example-guided deep reinforcement learning of physics-based character skills","volume":"37","author":"Peng","year":"","journal-title":"ACM Trans. Graph."},{"key":"B33","first-page":"3803","article-title":"Sim-to-real transfer of robotic control with dynamics randomization","author":"Peng","year":""},{"key":"B34","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1017\/s0140525x00005756","article-title":"Minds, brains, and programs","volume":"3","author":"Searle","year":"1980","journal-title":"Behav. Brain Sci."},{"key":"B35","doi-asserted-by":"publisher","first-page":"969","DOI":"10.1016\/j.tics.2018.08.008","article-title":"Being a beast machine: the somatic basis of selfhood","volume":"22","author":"Seth","year":"2018","journal-title":"Trends Cognitive Sci."},{"key":"B36","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2210.05663","article-title":"CLIP-fields: weakly supervised semantic fields for robotic memory","author":"Shafiullah","year":"2023","journal-title":"arXiv"},{"key":"B37","doi-asserted-by":"publisher","first-page":"16236","DOI":"10.1109\/ICRA57147.2024.10610948","article-title":"\u201cPrompt, plan, perform: LLM-based humanoid control via quantized imitation learning,\u201d","author":"Sun","year":"2024"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.07580","article-title":"SayTap: language to quadrupedal locomotion","author":"Tang","year":"2023","journal-title":"arXiv"},{"key":"B39","volume-title":"Mind in life: biology, phenomenology, and the sciences of mind","author":"Thompson","year":"2007"},{"key":"B40","doi-asserted-by":"publisher","first-page":"72","DOI":"10.1006\/brcg.1997.0907","article-title":"Patterns of life: intertwining identity and cognition","volume":"34","author":"Varela","year":"1997","journal-title":"Brain Cogn."},{"key":"B41","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1023\/A:1020368120174","article-title":"Life after kant: natural purposes and the autopoietic foundations of biological individuality","volume":"1","author":"Weber","year":"2002","journal-title":"Phenomenology Cognitive Sci."},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.11903","article-title":"Chain-of-thought prompting elicits reasoning in large language models","author":"Wei","year":"2023","journal-title":"arXiv"},{"key":"B43","article-title":"Minimal self in humanoid robot \u201calter3\u201d driven by large language model. vol. ALIFE 2024","author":"Yoshida","year":"2024"},{"key":"B44","doi-asserted-by":"crossref","DOI":"10.1162\/isal_a_00635","article-title":"Development of concept representation of behavior through mimicking and imitation in a humanoid robot Alter3","author":"Yoshida","year":"2023"},{"key":"B45","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.08647","article-title":"Language to rewards for robotic skill synthesis","author":"Yu","year":"2023","journal-title":"arXiv"},{"key":"B46","article-title":"Socratic models: composing zero-shot multimodal reasoning with language","author":"Zeng","year":"2023"},{"key":"B47","doi-asserted-by":"publisher","first-page":"7961","DOI":"10.1109\/iros55552.2023.10341488","article-title":"Large language models as zero-shot human models for human-robot interaction","author":"Zhang","year":"2023","journal-title":"arXiv"},{"key":"B48","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.11107","article-title":"ChatABL: abductive learning via natural language interaction with ChatGPT","author":"Zhong","year":"2023","journal-title":"arXiv"}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2025.1581110\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T05:24:53Z","timestamp":1748323493000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2025.1581110\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,27]]},"references-count":47,"alternative-id":["10.3389\/frobt.2025.1581110"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2025.1581110","relation":{},"ISSN":["2296-9144"],"issn-type":[{"value":"2296-9144","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,27]]},"article-number":"1581110"}}