{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T23:07:52Z","timestamp":1778195272768,"version":"3.51.4"},"reference-count":128,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,12,27]],"date-time":"2025-12-27T00:00:00Z","timestamp":1766793600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T00:00:00Z","timestamp":1766966400000},"content-version":"vor","delay-in-days":2,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000288","name":"Royal Society","doi-asserted-by":"publisher","award":["RG\\R2\\232409"],"award-info":[{"award-number":["RG\\R2\\232409"]}],"id":[{"id":"10.13039\/501100000288","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100032827","name":"Advanced Research and Invention Agency","doi-asserted-by":"publisher","award":["SMRB-SE01-P06"],"award-info":[{"award-number":["SMRB-SE01-P06"]}],"id":[{"id":"10.13039\/100032827","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Bipedal robots are gaining global recognition due to their potential applications and the rapid advancements in artificial intelligence, particularly through deep reinforcement learning (DRL). While DRL has significantly advanced bipedal locomotion, the development of a unified framework capable of handling a wide range of tasks remains an ongoing challenge. This survey systematically categorises, compares, and analyses existing DRL frameworks for bipedal locomotion, organising them into end-to-end and hierarchical control schemes. End-to-end frameworks are evaluated based on their learning approaches, whereas hierarchical frameworks are examined in terms of their layered structures that integrate learning-based and traditional model-based methods. We provide a detailed evaluation of the composition, strengths, limitations, and capabilities of each framework. Furthermore, this survey identifies key research gaps and proposes future directions aimed at creating a more integrated and efficient unified framework for bipedal locomotion, with broad applicability in real-world environments.<\/jats:p>","DOI":"10.1007\/s10462-025-11451-z","type":"journal-article","created":{"date-parts":[[2025,12,27]],"date-time":"2025-12-27T07:06:30Z","timestamp":1766819190000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Deep reinforcement learning for robotic bipedal locomotion: a brief survey"],"prefix":"10.1007","volume":"59","author":[{"given":"Lingfan","family":"Bao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joseph","family":"Humphreys","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianhu","family":"Peng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengxu","family":"Zhou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,12,27]]},"reference":[{"key":"11451_CR1","unstructured":"6+ Hours Live Autonomous Robot Demo (2024). https:\/\/www.youtube.com\/watch?v=Ke468Mv8ldM"},{"key":"11451_CR2","doi-asserted-by":"crossref","unstructured":"Arm P, Mittal M, Kolvenbach H, Hutter M (2024) Pedipulate: Enabling manipulation skills using a quadruped robot\u2019s leg. In: IEEE Conference on Robotics and Automation","DOI":"10.1109\/ICRA57147.2024.10611307"},{"key":"11451_CR3","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/MSP.2017.2743240","volume":"34","author":"K Arulkumaran","year":"2017","unstructured":"Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: A brief survey. IEEE Signal Process Mag 34:26\u201338. https:\/\/doi.org\/10.1109\/MSP.2017.2743240","journal-title":"IEEE Signal Process Mag"},{"key":"11451_CR4","doi-asserted-by":"crossref","unstructured":"Atkeson CG, Babu BPW, Banerjee N, Berenson D, Bove CP, Cui X, DeDonato M, Du R, Feng S, Franklin P, et al (2015) No falls, no resets: Reliable humanoid behavior in the darpa robotics challenge. In: IEEE-RAS 15th International Conference on Humanoid Robot, pp 623\u2013630","DOI":"10.1109\/HUMANOIDS.2015.7363436"},{"key":"11451_CR5","doi-asserted-by":"publisher","first-page":"7490","DOI":"10.1109\/ACCESS.2023.3344393","volume":"12","author":"O Aydogmus","year":"2023","unstructured":"Aydogmus O, Yilmaz M (2023) Comparative analysis of reinforcement learning algorithms for bipedal robot locomotion. IEEE Access 12:7490\u20137499","journal-title":"IEEE Access"},{"issue":"6","key":"11451_CR6","doi-asserted-by":"publisher","first-page":"1343","DOI":"10.1109\/TRO.2017.2752711","volume":"33","author":"K Ayusawa","year":"2017","unstructured":"Ayusawa K, Yoshida E (2017) Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization. IEEE Trans Rob 33(6):1343\u20131357. https:\/\/doi.org\/10.1109\/TRO.2017.2752711","journal-title":"IEEE Trans Rob"},{"key":"11451_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2023.106941","volume":"126","author":"J Baltes","year":"2023","unstructured":"Baltes J, Christmann G, Saeedvand S (2023) A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot. Eng Appl Artif Intell 126:106941","journal-title":"Eng Appl Artif Intell"},{"key":"11451_CR8","unstructured":"Bauer J, Baumli K, Behbahani F, Bhoopchand A, Bradley-Schmieg N, Chang M, Clay N, Collister A, Dasagi V, Gonzalez L, Gregor K, Hughes E, Kashem S, Loks-Thompson M, Openshaw H, Parker-Holder J, Pathak S, Perez-Nieves N, Rakicevic N, Rockt\u00e4schel T, Schroecker Y, Singh S, Sygnowski J, Tuyls K, York S, Zacherl A, Zhang L (2023) Human-timescale adaptation in an open-ended task space. In: Proceedings of the 40th International Conference on Machine Learning"},{"key":"11451_CR9","doi-asserted-by":"publisher","first-page":"172988141983958","DOI":"10.1177\/1729881419839584","volume":"16","author":"G Bingjing","year":"2019","unstructured":"Bingjing G, Jianhai H, Xiangpan L, Lin Y (2019) Human-robot interactive control based on reinforcement learning for gait rehabilitation training robot. Int J Adv Rob Syst 16:1729881419839584","journal-title":"Int J Adv Rob Syst"},{"key":"11451_CR10","doi-asserted-by":"crossref","unstructured":"Birk A, Coradeschi S, Tadokoro S (2003) RoboCup 2001: Robot Soccer World Cup V vol. 2377","DOI":"10.1007\/3-540-45603-1"},{"key":"11451_CR11","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1108\/IR-01-2015-0010","volume":"42","author":"R Bogue","year":"2015","unstructured":"Bogue R (2015) Underwater robots: a review of technologies and applications. Ind Robot 42:186\u2013191","journal-title":"Ind Robot"},{"key":"11451_CR12","unstructured":"Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540"},{"key":"11451_CR13","unstructured":"Brohan A, Brown N,Carbajal J, Chebotar Y, Chen X, Choromanski K, Ding T, Driess D, Dubey A, Finn C, et al (2023) RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint https:\/\/arxiv.org\/abs\/2307.15818"},{"key":"11451_CR14","doi-asserted-by":"crossref","unstructured":"Byravan A, Humplik J, Hasenclever L, Brussee A, Nori F, Haarnoja T, Moran B, Bohez S, Sadeghi F, Vujatovic B, et al (2023) NerRfF2real: Sim2real transfer of vision-guided bipedal motion skills using neural radiance fields. In: IEEE International Conference on Robotics and Automation, pp. 9362\u20139369","DOI":"10.1109\/ICRA48891.2023.10161544"},{"key":"11451_CR15","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1007\/s43154-021-00059-0","volume":"2","author":"J Carpentier","year":"2021","unstructured":"Carpentier J, Wieber P-B (2021) Recent progress in legged robots locomotion control. Curr Robot Reports 2:231\u2013238. https:\/\/doi.org\/10.1007\/s43154-021-00059-0","journal-title":"Curr Robot Reports"},{"key":"11451_CR17","doi-asserted-by":"publisher","unstructured":"Castillo GA, Weng B, Zhang W, Hereid A (2021) Robust feedback motion policy design using reinforcement learning on a 3D digit bipedal robot. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp. 5136\u20135143. https:\/\/doi.org\/10.1109\/IROS51168.2021.9636467","DOI":"10.1109\/IROS51168.2021.9636467"},{"key":"11451_CR18","doi-asserted-by":"publisher","first-page":"20135","DOI":"10.1109\/ACCESS.2022.3151771","volume":"10","author":"GA Castillo","year":"2022","unstructured":"Castillo GA, Weng B, Zhang W, Hereid A (2022) Reinforcement learning-based cascade motion policy design for robust 3dD bipedal locomotion. IEEE Access 10:20135\u201320148. https:\/\/doi.org\/10.1109\/ACCESS.2022.3151771","journal-title":"IEEE Access"},{"key":"11451_CR16","doi-asserted-by":"publisher","unstructured":"Castillo GA, Weng B, Yang S, Zhang W, Hereid A (2023) Template model inspired task space learning for robust bipedal locomotion. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 8582\u20138589. https:\/\/doi.org\/10.1109\/IROS55552.2023.10341263","DOI":"10.1109\/IROS55552.2023.10341263"},{"key":"11451_CR19","doi-asserted-by":"crossref","unstructured":"Chen AS, Lessing AM, Tang A, Chada G, Smith L, Levine S, Finn C (2024) Commonsense reasoning for legged robot adaptation with vision-language models. arXiv preprint arXiv:2407.02666","DOI":"10.1109\/ICRA55743.2025.11127234"},{"key":"11451_CR20","doi-asserted-by":"crossref","unstructured":"Cheng X, Ji Y, Chen J, Yang R, Yang G, Wang X (2024) Expressive whole-body control for humanoid robots. arXiv preprint arXiv:2402.16796","DOI":"10.15607\/RSS.2024.XX.107"},{"key":"11451_CR21","doi-asserted-by":"publisher","first-page":"2256","DOI":"10.1126\/scirobotics.ade2256","volume":"8","author":"S Choi","year":"2023","unstructured":"Choi S, Ji G, Park J, Kim H, Mun J, Lee JH, Hwangbo J (2023) Learning quadrupedal locomotion on deformable terrain. Sci Robot 8:2256. https:\/\/doi.org\/10.1126\/scirobotics.ade2256","journal-title":"Sci Robot"},{"key":"11451_CR22","doi-asserted-by":"crossref","unstructured":"Dankwa S, Zheng W (2019) Twin-delayed DDPG: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent. In: International Conference on Vision, Image and Signal Processing, pp. 1\u20135","DOI":"10.1145\/3387168.3387199"},{"key":"11451_CR23","doi-asserted-by":"crossref","unstructured":"Dao J, Duan H, Fern A (2023) Sim-to-real learning for humanoid box loco-manipulation. arXiv preprint arXiv:2310.03191","DOI":"10.1109\/ICRA57147.2024.10610977"},{"key":"11451_CR24","doi-asserted-by":"crossref","unstructured":"Duan H, Dao J, Green K, Apgar T, Fern A, Hurst J (2021) Learning task space actions for bipedal locomotion. In: IEEE International Conference on Robotics and Automation, pp 1276\u20131282","DOI":"10.1109\/ICRA48506.2021.9561705"},{"key":"11451_CR25","doi-asserted-by":"crossref","unstructured":"Duan H, Malik A, Dao J, Saxena A, Green K, Siekmann J, Fern A, Hurst J (2022a) Sim-to-real learning of footstep-constrained bipedal dynamic walking. In: International Conference on Robotics and Automation, pp 10428\u201310434","DOI":"10.1109\/ICRA46639.2022.9812015"},{"key":"11451_CR26","doi-asserted-by":"publisher","unstructured":"Duan H, Malik A, Gadde MS, Dao J, Fern A, Hurst J (2022b) Learning dynamic bipedal walking across stepping stones. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 6746\u20136752. https:\/\/doi.org\/10.1109\/IROS47612.2022.9981884","DOI":"10.1109\/IROS47612.2022.9981884"},{"key":"11451_CR27","doi-asserted-by":"publisher","first-page":"135","DOI":"10.3390\/app12010135","volume":"12","author":"A Dzedzickis","year":"2021","unstructured":"Dzedzickis A, Suba\u010di\u016bt\u0117-\u017demaitien\u0117 J, \u0160utinys E, Samukait\u0117-Bubnien\u0117 U, Bu\u010dinskas V (2021) Advanced applications of industrial robotics: New trends and possibilities. Appl Sci 12:135","journal-title":"Appl Sci"},{"key":"11451_CR28","unstructured":"Feng G, Zhang H, Li Z, Peng XB, Basireddy B, Yue L, SONG Z, Yang L, Liu Y, Sreenath K, Levine S (2023) GenlLoco: Generalized locomotion controllers for quadrupedal robots. In: Conference on Robot Learning, vol 205, pp 1893\u20131903"},{"issue":"5","key":"11451_CR29","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1177\/02783649241281508","volume":"44","author":"R Firoozi","year":"2023","unstructured":"Firoozi R, Tucker J, Tian S, Majumdar A, Sun J, Liu W, Zhu Y, Song S, Kapoor A, Hausman K et al (2023) Foundation models in robotics: Applications, challenges, and the future. Int J Robot Res 44(5):701\u2013739","journal-title":"Int J Robot Res"},{"key":"11451_CR30","unstructured":"Freeman CD, Frey E, Raichuk A, Girgin S, Mordatch I, Bachem O (2021) Brax \u2013 a differentiable physics engine for large scale rigid body simulation arXiv:2106.13281 [cs.RO]"},{"key":"11451_CR31","unstructured":"Fu Z, Cheng X, Pathak D (2023) Deep whole-body control: Learning a unified policy for manipulation and locomotion. In: Conference on Robot Learning, pp. 138\u2013149"},{"key":"11451_CR33","unstructured":"Fu Z, Zhao Q, Wu Q, Wetzstein G, Finn C (2024) Humanplus: Humanoid shadowing and imitation from humans. arXiv preprint arXiv:2406.10454"},{"key":"11451_CR32","doi-asserted-by":"publisher","unstructured":"Fuchioka Y, Xie Z, van de Panne M (2023) OPT-Mimic: Imitation of optimized trajectories for dynamic quadruped behaviors. In: IEEE International Conference on Robotics and Automation, pp. 5092\u20135098. https:\/\/doi.org\/10.1109\/ICRA48891.2023.10160562","DOI":"10.1109\/ICRA48891.2023.10160562"},{"key":"11451_CR34","doi-asserted-by":"publisher","first-page":"2908","DOI":"10.1109\/TRO.2022.3172469","volume":"38","author":"S Gangapurwala","year":"2022","unstructured":"Gangapurwala S, Geisert M, Orsolino R, Fallon M, Havoutis I (2022) RLOC: Terrain-aware legged locomotion using reinforcement learning and optimal control. IEEE Trans Robot 38:2908\u20132927. https:\/\/doi.org\/10.1109\/TRO.2022.3172469","journal-title":"IEEE Trans Robot"},{"key":"11451_CR35","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2019.103360","volume":"88","author":"J Garc\u00eda","year":"2020","unstructured":"Garc\u00eda J, Shafie D (2020) Teaching a humanoid robot to walk faster through safe reinforcement learning. Eng Appl Artif Intell 88:103360. https:\/\/doi.org\/10.1016\/j.engappai.2019.103360","journal-title":"Eng Appl Artif Intell"},{"key":"11451_CR36","doi-asserted-by":"crossref","unstructured":"Gaspard C, Passault G, Daniel M, Ly O (2024) FootstepNet: an efficient actor-critic method for fast on-line bipedal footstep planning and forecasting. arXiv preprint arXiv:2403.12589","DOI":"10.1109\/IROS58592.2024.10802320"},{"key":"11451_CR37","doi-asserted-by":"publisher","first-page":"3926","DOI":"10.1109\/LRA.2021.3066833","volume":"6","author":"K Green","year":"2021","unstructured":"Green K, Godse Y, Dao J, Hatton RL, Fern A, Hurst J (2021) Learning spring mass locomotion: Guiding policies with a reduced-order model. IEEE Robot Autom Lett 6:3926\u20133932. https:\/\/doi.org\/10.1109\/LRA.2021.3066833","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR38","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1080\/01691864.2017.1308270","volume":"31","author":"S Gupta","year":"2017","unstructured":"Gupta S, Kumar A (2017) A brief review of dynamics and control of underactuated biped robots. Adv Robot 31:607\u2013623","journal-title":"Adv Robot"},{"key":"11451_CR39","doi-asserted-by":"publisher","first-page":"8022","DOI":"10.1126\/scirobotics.adi8022","volume":"9","author":"T Haarnoja","year":"2024","unstructured":"Haarnoja T, Moran B, Lever G, Huang SH, Tirumala D, Humplik J, Wulfmeier M, Tunyasuvunakool S, Siegel NY, Hafner R et al (2024) Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Sci Robot 9:8022","journal-title":"Sci Robot"},{"key":"11451_CR40","unstructured":"Heess N, TB, D, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami A, Riedmiller M, Silver D (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286"},{"key":"11451_CR41","doi-asserted-by":"publisher","unstructured":"Herzog A, Schaal S, Righetti L (2016) Structured contact force optimization for kino-dynamic motion generation. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 2703\u20132710. https:\/\/doi.org\/10.1109\/IROS.2016.7759420","DOI":"10.1109\/IROS.2016.7759420"},{"key":"11451_CR42","unstructured":"Ho J, Ermon S (2016) Generative adversarial imitation learning. In: International Conference on Neural Information Processing Systems, pp 4572\u20134580"},{"key":"11451_CR43","doi-asserted-by":"publisher","first-page":"975","DOI":"10.3390\/machines10110975","volume":"10","author":"L Hou","year":"2022","unstructured":"Hou L, Li B, Liu W, Xu Y, Yang S, Rong X (2022) Deep reinforcement learning for model predictive controller based on disturbed single rigid body model of biped robots. Machines 10:975","journal-title":"Machines"},{"key":"11451_CR44","doi-asserted-by":"publisher","first-page":"7686","DOI":"10.1109\/TPAMI.2022.3223407","volume":"45","author":"C Huang","year":"2023","unstructured":"Huang C, Wang G, Zhou Z, Zhang R, Lin L (2023) Reward-adaptive reinforcement learning: Dynamic policy gradient optimization for bipedal locomotion. IEEE Trans Pattern Anal Mach Intell 45:7686\u20137695. https:\/\/doi.org\/10.1109\/TPAMI.2022.3223407","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11451_CR45","unstructured":"Huang X, Chi Y, Wang R, Li Z, Peng XB, Shao S, Nikolic B, Sreenath K (2024) Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets. arXiv preprint arXiv:2404.19264"},{"key":"11451_CR48","unstructured":"Hu Y, Xie Q, Jain V, Francis J, Patrikar J, Keetha N, Kim S, Xie Y, Zhang T, Fang H-S, et al (2023) Toward general-purpose robots via foundation models: A survey and meta-analysis. arXiv preprint arXiv:2312.08782"},{"key":"11451_CR46","doi-asserted-by":"crossref","unstructured":"Hu J, Hendrix R, Farhadi A, Kembhavi A, Martin-Martin R, Stone P, Zeng K-H, Ehsani K (2024) Flare: Achieving masterful and adaptive robot policies with large-scale reinforcement learning fine-tuning. arXiv: 2409.16578","DOI":"10.1109\/ICRA55743.2025.11127934"},{"key":"11451_CR47","doi-asserted-by":"crossref","unstructured":"Humphreys J, Zhou C (2024) Learning to adapt through bio-inspired gait strategies for versatile quadruped locomotion. arXiv preprint arXiv:2412.09440","DOI":"10.1038\/s42256-025-01065-z"},{"key":"11451_CR49","doi-asserted-by":"publisher","first-page":"5872","DOI":"10.1126\/scirobotics.aau5872","volume":"4","author":"J Hwangbo","year":"2019","unstructured":"Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, Hutter M (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot 4:5872","journal-title":"Sci Robot"},{"key":"11451_CR50","unstructured":"Irpan A, Herzog A, Toshev AT, Zeng A, Brohan A, Ichter BA, David B, Parada C, Finn C, Tan C, et al (2022) Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691"},{"key":"11451_CR51","doi-asserted-by":"publisher","first-page":"3395","DOI":"10.1109\/TRO.2022.3186804","volume":"38","author":"F Jenelten","year":"2022","unstructured":"Jenelten F, Grandia R, Farshidian F, Hutter M (2022) TAMOLS: Terrain-aware motion optimization for legged systems. IEEE Trans Robot 38:3395\u20133413. https:\/\/doi.org\/10.1109\/TRO.2022.3186804","journal-title":"IEEE Trans Robot"},{"key":"11451_CR52","doi-asserted-by":"publisher","first-page":"5401","DOI":"10.1126\/scirobotics.adh5401","volume":"9","author":"F Jenelten","year":"2024","unstructured":"Jenelten F, He J, Farshidian F, Hutter M (2024) DTC: Deep tracking control. Sci Robot 9:5401. https:\/\/doi.org\/10.1126\/scirobotics.adh5401","journal-title":"Sci Robot"},{"key":"11451_CR53","doi-asserted-by":"publisher","first-page":"6619","DOI":"10.1109\/LRA.2023.3307008","volume":"8","author":"D Kang","year":"2023","unstructured":"Kang D, Cheng J, Zamora M, Zargarbashi F, Coros S (2023) RL + Model-Based Control: Using on-demand optimal control to learn versatile legged locomotion. IEEE Robot Autom Lett 8:6619\u20136626. https:\/\/doi.org\/10.1109\/LRA.2023.3307008","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR54","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2021.103900","volume":"146","author":"M Kasaei","year":"2021","unstructured":"Kasaei M, Abreu M, Lau N, Pereira A, Reis L (2021) Robust biped locomotion using deep reinforcement learning on top of an analytical control approach. Robot Autonom Syst 146:103900. https:\/\/doi.org\/10.1016\/j.robot.2021.103900","journal-title":"Robot Autonom Syst"},{"key":"11451_CR55","doi-asserted-by":"publisher","first-page":"176598","DOI":"10.1109\/ACCESS.2020.3027152","volume":"8","author":"MA-M Khan","year":"2020","unstructured":"Khan MA-M, Khan MRJ, Tooshil A, Sikder N, Mahmud MAP, Kouzani AZ, Nahid A-A (2020) A systematic review on reinforcement learning-based robotics within the last decade. IEEE Access 8:176598\u2013176623. https:\/\/doi.org\/10.1109\/ACCESS.2020.3027152","journal-title":"IEEE Access"},{"key":"11451_CR57","doi-asserted-by":"publisher","unstructured":"Kumar A, Li Z, Zeng J, Pathak D, Sreenath K, Malik J (2022) Adapting rapid motor adaptation for bipedal robots. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 1161\u20131168. https:\/\/doi.org\/10.1109\/IROS47612.2022.9981091","DOI":"10.1109\/IROS47612.2022.9981091"},{"key":"11451_CR56","unstructured":"Kumar KN, Essa I, Ha S (2023) Words into action: Learning diverse humanoid robot behaviors using language guided iterative motion refinement. In: Workshop on Language and Robot Learning: Language as Grounding"},{"key":"11451_CR58","doi-asserted-by":"publisher","first-page":"76523","DOI":"10.1109\/ACCESS.2022.3176608","volume":"10","author":"J Leng","year":"2022","unstructured":"Leng J, Fan S, Tang J, Mou H, Xue J, Li Q (2022) M-A3C: A mean-asynchronous advantage actor-critic reinforcement learning method for real-time gait planning of biped robot. IEEE Access 10:76523\u201376536. https:\/\/doi.org\/10.1109\/ACCESS.2022.3176608","journal-title":"IEEE Access"},{"key":"11451_CR59","doi-asserted-by":"publisher","first-page":"1279","DOI":"10.1109\/LCSYS.2023.3234769","volume":"7","author":"J Li","year":"2023","unstructured":"Li J, Nguyen Q (2023) Dynamic walking of bipedal robots on uneven stepping stones via adaptive-frequency mpc. IEEE Control Syst Lett 7:1279\u20131284. https:\/\/doi.org\/10.1109\/LCSYS.2023.3234769","journal-title":"IEEE Control Syst Lett"},{"key":"11451_CR62","doi-asserted-by":"publisher","unstructured":"Li T, Geyer H, Atkeson CG, Rai A (2019) Using deep reinforcement learning to learn high-level policies on the ATRIAS biped. In: International Conference on Robotics and Automation, pp 263\u2013269. https:\/\/doi.org\/10.1109\/ICRA.2019.8793864","DOI":"10.1109\/ICRA.2019.8793864"},{"key":"11451_CR61","doi-asserted-by":"publisher","unstructured":"Li Z, Cheng X, Peng XB, Abbeel P, Levine S, Berseth G, Sreenath K (2021) Reinforcement learning for robust parameterized locomotion control of bipedal robots. In: IEEE International Conference on Robotics and Automation, pp 2811\u20132817. https:\/\/doi.org\/10.1109\/ICRA48506.2021.9560769","DOI":"10.1109\/ICRA48506.2021.9560769"},{"key":"11451_CR69","doi-asserted-by":"publisher","unstructured":"Li Z, Zeng J, Thirugnanam A, Sreenath K (2022) Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models. In: Proceedings of Robotics: Science and Systems. https:\/\/doi.org\/10.15607\/RSS.2022.XVIII.033","DOI":"10.15607\/RSS.2022.XVIII.033"},{"key":"11451_CR65","doi-asserted-by":"publisher","unstructured":"Li Z, Peng XB, Abbeel P, Levine S, Berseth G, Sreenath K (2023) Robust and versatile bipedal jumping control through multi-task reinforcement learning. In: Robotics: Science and Systems. https:\/\/doi.org\/10.48550\/arXiv.2302.09450","DOI":"10.48550\/arXiv.2302.09450"},{"key":"11451_CR66","doi-asserted-by":"crossref","unstructured":"Li Z, Peng XB, Abbeel P, Levine S, Berseth G, Sreenath K (2024a) Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control. arXiv e-prints, 2401","DOI":"10.1177\/02783649241285161"},{"key":"11451_CR68","unstructured":"Li J, Ye L, Cheng Y, Liu H, Liang B (2024b) Agile and versatile bipedal robot tracking control through reinforcement learning. arXiv preprint arXiv:2404.08246"},{"key":"11451_CR63","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: International Conference on Learning Representations"},{"key":"11451_CR64","unstructured":"Lin J, Zeng A, Lu S, Cai Y, Zhang R, Wang H, Zhang L (2024) Motion-X: A large-scale 3D expressive whole-body human motion dataset. NeurIPS"},{"key":"11451_CR60","doi-asserted-by":"crossref","unstructured":"Liang J, Huang W, Xia F, Xu P, Hausman K, Ichter B, Florence P, Zeng A (2022) Code as policies: Language model programs for embodied control. In: arXiv Preprint arXiv:2209.07753","DOI":"10.1109\/ICRA48891.2023.10160591"},{"key":"11451_CR67","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2893476","volume":"35","author":"L Liu","year":"2016","unstructured":"Liu L, van de Panne M, Yin K (2016) Guided learning of control graphs for physics-based characters. ACM Trans Graph 35:1\u201314. https:\/\/doi.org\/10.1145\/2893476","journal-title":"ACM Trans Graph"},{"issue":"4","key":"11451_CR70","doi-asserted-by":"publisher","first-page":"3247","DOI":"10.1109\/LRA.2018.2851148","volume":"3","author":"K Lobos-Tsunekawa","year":"2018","unstructured":"Lobos-Tsunekawa K, Leiva F, Ruiz-del-Solar J (2018) Visual navigation for biped humanoid robots using deep reinforcement learning. IEEE Robot Autom Lett 3(4):3247\u20133254. https:\/\/doi.org\/10.1109\/LRA.2018.2851148","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR71","doi-asserted-by":"publisher","first-page":"2377","DOI":"10.1109\/LRA.2022.3143567","volume":"7","author":"Y Ma","year":"2022","unstructured":"Ma Y, Farshidian F, Miki T, Lee J, Hutter M (2022) Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators. IEEE Robot Autom Lett 7:2377\u20132384. https:\/\/doi.org\/10.1109\/LRA.2022.3143567","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR72","doi-asserted-by":"publisher","unstructured":"Mahmood N, Ghorbani N, Troje NF, Pons-Moll G, Black M (2019) AMASS: Archive of motion capture as surface shapes. In: 2019 IEEE\/CVF International Conference on Computer Vision, pp 5441\u20135450. https:\/\/doi.org\/10.1109\/ICCV.2019.00554","DOI":"10.1109\/ICCV.2019.00554"},{"key":"11451_CR73","unstructured":"Marum B, Sabatelli M, Kasaei H (2023a) Learning perceptive bipedal locomotion over irregular terrain. arXiv preprint arXiv:2304.07236"},{"key":"11451_CR74","unstructured":"Marum B, Sabatelli M, Kasaei H (2023b) Learning vision-based bipedal locomotion for challenging terrain. arXiv preprint arXiv:2309.14594"},{"key":"11451_CR75","doi-asserted-by":"crossref","unstructured":"Masuda S, Takahashi K (2023) Sim-to-real transfer of compliant bipedal locomotion on torque sensor-less gear-driven humanoid. In: IEEE-RAS International Conference on Humanoid Robots, pp. 1\u20138","DOI":"10.1109\/Humanoids57100.2023.10375181"},{"key":"11451_CR76","doi-asserted-by":"publisher","unstructured":"Meduri A, Khadiv M, Righetti L (2021) DeepQ Stepper: A framework for reactive dynamic walking on uneven terrain. In: IEEE International Conference on Robotics and Automation, pp 2099\u20132105. https:\/\/doi.org\/10.1109\/ICRA48506.2021.9562093","DOI":"10.1109\/ICRA48506.2021.9562093"},{"key":"11451_CR77","unstructured":"Merel J, Tassa Y, TB, D, Srinivasan S, Lemmon J, Wang Z, Wayne G, Heess N (2017) Learning human behaviors from motion capture by adversarial imitation. arXiv:1707.02201"},{"issue":"6","key":"11451_CR78","doi-asserted-by":"publisher","first-page":"3740","DOI":"10.1109\/LRA.2023.3270034","volume":"8","author":"M Mittal","year":"2023","unstructured":"Mittal M, Yu C, Yu Q, Liu J, Rudin N, Hoeller D, Yuan JL, Singh R, Guo Y, Mazhar H, Mandlekar A, Babich B, State G, Hutter M, Garg A (2023) ORBIT: A unified simulation framework for interactive robot learning environments. IEEE Robot Autom Lett 8(6):3740\u20133747. https:\/\/doi.org\/10.1109\/LRA.2023.3270034","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR79","doi-asserted-by":"publisher","unstructured":"Morimoto J, Cheng G, Atkeson CG, Zeglin G (2004) A simple reinforcement learning algorithm for biped walking. In: IEEE International Conference on Robotics and Automation, pp 3030\u201330353. https:\/\/doi.org\/10.1109\/ROBOT.2004.1307522","DOI":"10.1109\/ROBOT.2004.1307522"},{"key":"11451_CR80","doi-asserted-by":"publisher","unstructured":"Nguyen Q, Hereid A, Grizzle JW, Ames AD, Sreenath K (2016) 3D dynamic walking on stepping stones with control barrier functions. In: IEEE Conference on Decision and Control, pp. 827\u2013834. https:\/\/doi.org\/10.1109\/CDC.2016.7798370","DOI":"10.1109\/CDC.2016.7798370"},{"key":"11451_CR81","doi-asserted-by":"crossref","unstructured":"Ouyang Y, Li J, Li Y, Li Z, Yu C, Sreenath K, Wu Y (2024) Long-horizon locomotion and manipulation on a quadrupedal robot with large language models. arXiv preprint arXiv:2404.05291","DOI":"10.1109\/IROS60139.2025.11246632"},{"key":"11451_CR82","first-page":"1","volume":"39","author":"H Park","year":"2020","unstructured":"Park H, Yu R, Lee Y, Lee K, Lee J (2020) Understanding the stability of deep control policies for biped locomotion. Vis Comput 39:1\u201315","journal-title":"Vis Comput"},{"key":"11451_CR83","doi-asserted-by":"publisher","unstructured":"Penco L, Clement B, Modugno V, Mingo\u00a0Hoffman E, Nava G, Pucci D, Tsagarakis NG, Mouret J-B, Ivaldi S (2018) Robust real-time whole-body motion retargeting from human to humanoid. In: 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), pp 425\u2013432. https:\/\/doi.org\/10.1109\/HUMANOIDS.2018.8624943","DOI":"10.1109\/HUMANOIDS.2018.8624943"},{"key":"11451_CR85","doi-asserted-by":"crossref","unstructured":"Peng XB, van de Panne M (2017) Learning locomotion skills using DeepRL: Does the choice of action space matter? In: ACM SIGGRAPH\/Eurographics Symposium on Computer Animation, pp 1\u201313","DOI":"10.1145\/3099564.3099567"},{"key":"11451_CR86","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3072959.3073602","volume":"36","author":"X Peng","year":"2017","unstructured":"Peng X, Berseth G, Yin K, van de Panne M (2017) DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans Graph 36:1\u201313. https:\/\/doi.org\/10.1145\/3072959.3073602","journal-title":"ACM Trans Graph"},{"key":"11451_CR87","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201311","volume":"37","author":"X Peng","year":"2018","unstructured":"Peng X, Abbeel P, Levine S, van de Panne M (2018) DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans Graph 37:143. https:\/\/doi.org\/10.1145\/3197517.3201311","journal-title":"ACM Trans Graph"},{"key":"11451_CR84","doi-asserted-by":"publisher","unstructured":"Peng XB, Coumans E, Zhang T, Lee T-WE, Tan J, Levine S (2020) Learning agile robotic locomotion skills by imitating animals. In: Robotics: Science and Systems. https:\/\/doi.org\/10.15607\/RSS.2020.XVI.064","DOI":"10.15607\/RSS.2020.XVI.064"},{"key":"11451_CR88","doi-asserted-by":"publisher","DOI":"10.1016\/j.ast.2023.108689","volume":"142","author":"J Qi","year":"2023","unstructured":"Qi J, Gao H, Su H, Han L, Su B, Huo M, Yu H, Deng Z (2023) Reinforcement learning-based stable jump control method for asteroid-exploration quadruped robots. Aerosp Sci Technol 142:108689","journal-title":"Aerosp Sci Technol"},{"key":"11451_CR89","doi-asserted-by":"crossref","unstructured":"Radosavovic I, Xiao T, Zhang B, Darrell T, Malik J, Sreenath K (2024a) Real-world humanoid locomotion with reinforcement learning. Sci Robot 9(89):9579","DOI":"10.1126\/scirobotics.adi9579"},{"key":"11451_CR90","unstructured":"Radosavovic I, Zhang B, Shi B, Rajasegaran J, Kamat S, Darrell T, Sreenath K, Malik J (2024b) Humanoid locomotion as next token prediction. arXiv preprint arXiv:2402.19469"},{"key":"11451_CR91","unstructured":"Rafailov R, Hatch KB, Kolev V, Martin JD, Phielipp M, Finn C (2023) MOTO: Offline pre-training to online fine-tuning for model-based robot learning. In: Proceedings of The 7th Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 229, pp. 3654\u20133671"},{"key":"11451_CR92","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1146\/annurev-control-071020-045021","volume":"4","author":"J Reher","year":"2021","unstructured":"Reher J, Ames A (2021) Dynamic walking: Toward agile and efficient bipedal robots. Ann Rev Control Robot Auton Syst 4:535\u2013572. https:\/\/doi.org\/10.1146\/annurev-control-071020-045021","journal-title":"Ann Rev Control Robot Auton Syst"},{"key":"11451_CR93","unstructured":"Robotics NA (2025) Isaac Lab. https:\/\/isaac-sim.github.io\/IsaacLab\/main\/index.html. Accessed: 2025-06-06"},{"key":"11451_CR94","doi-asserted-by":"publisher","unstructured":"Rodriguez D, Behnke S (2021) DeepWalk: Omnidirectional bipedal gait by deep reinforcement learning. In: IEEE International Conference on Robotics and Automation, pp 3033\u20133039. https:\/\/doi.org\/10.1109\/ICRA48506.2021.9561717","DOI":"10.1109\/ICRA48506.2021.9561717"},{"key":"11451_CR95","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1109\/TRO.2021.3084374","volume":"38","author":"N Rudin","year":"2022","unstructured":"Rudin N, Kolvenbach H, Tsounis V, Hutter M (2022) Cat-like jumping and landing of legged robots in low gravity using deep reinforcement learning. IEEE Trans Robot 38:317\u2013328. https:\/\/doi.org\/10.1109\/TRO.2021.3084374","journal-title":"IEEE Trans Robot"},{"key":"11451_CR96","unstructured":"Rudin N, Hoeller D, Reist P, Hutter M (2022) Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, pp 91\u2013100"},{"key":"11451_CR97","unstructured":"Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889\u20131897"},{"key":"11451_CR98","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347"},{"key":"11451_CR99","doi-asserted-by":"crossref","unstructured":"Seo M, Han S, Sim K, Bang SH, Gonzalez C, Sentis L, Zhu Y (2023) Deep imitation learning for humanoid loco-manipulation through human teleoperation. In: IEEE-RAS International Conference on Humanoid Robots, pp 1\u20138","DOI":"10.1109\/Humanoids57100.2023.10375203"},{"key":"11451_CR102","doi-asserted-by":"crossref","unstructured":"Siekmann J, Valluri S, Dao J, Bermillo L, Duan H, Fern A, Hurst JW (2020) Learning memory-based control for human-scale bipedal locomotion. In: Robotics Science and Systems","DOI":"10.15607\/RSS.2020.XVI.031"},{"key":"11451_CR100","doi-asserted-by":"crossref","unstructured":"Siekmann J, Godse Y, Fern A, Hurst J (2021a) Sim-to-real learning of all common bipedal gaits via periodic reward composition. In: IEEE International Conference on Robotics and Automation, pp 7309\u20137315","DOI":"10.1109\/ICRA48506.2021.9561814"},{"key":"11451_CR101","doi-asserted-by":"publisher","unstructured":"Siekmann J, Green K, Warila J, Fern A, Hurst J (2021b) Blind bipedal stair traversal via sim-to-real reinforcement learning. In: Robotics: Science and Systems. https:\/\/doi.org\/10.15607\/RSS.2021.XVII.061","DOI":"10.15607\/RSS.2021.XVII.061"},{"key":"11451_CR103","doi-asserted-by":"publisher","unstructured":"Singh RP, Benallegue M, Morisawa M, Cisneros R, Kanehiro F (2022) Learning bipedal walking on planned footsteps for humanoid robots. In: IEEE-RAS International Conference on Humanoid Robots, pp 686\u2013693. https:\/\/doi.org\/10.1109\/Humanoids53995.2022.10000067","DOI":"10.1109\/Humanoids53995.2022.10000067"},{"key":"11451_CR104","doi-asserted-by":"publisher","first-page":"82013","DOI":"10.1109\/access.2023.3301175","volume":"11","author":"RP Singh","year":"2023","unstructured":"Singh RP, Xie Z, Gergondet P, Kanehiro F (2023) Learning bipedal walking for humanoids with current feedbackLearning bipedal walking for humanoids with current feedback. IEEE Access 11:82013\u201382023. https:\/\/doi.org\/10.1109\/access.2023.3301175","journal-title":"IEEE Access"},{"key":"11451_CR105","doi-asserted-by":"crossref","unstructured":"Smith L, Kostrikov I, Levine S (2023) Demonstrating a walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. Robotics: Science and Systems Demo 2, 4","DOI":"10.15607\/RSS.2023.XIX.056"},{"key":"11451_CR106","doi-asserted-by":"crossref","unstructured":"Tang A, Hiraoka T, Hiraoka N, Shi F, Kawaharazuka K, Kojima K, Okada K, Inaba M (2023) HumanMimic: Learning natural locomotion and transitions for humanoid robot via wasserstein adversarial imitation. arXiv preprint arXiv:2309.14225","DOI":"10.1109\/ICRA57147.2024.10610449"},{"key":"11451_CR107","doi-asserted-by":"publisher","first-page":"2802","DOI":"10.1109\/TCSII.2022.3145373","volume":"69","author":"C Tao","year":"2022","unstructured":"Tao C, Xue J, Zhang Z, Gao Z (2022) Parallel deep reinforcement learning method for gait control of biped robot. IEEE Trans Circuits Syst II Express Briefs 69:2802\u20132806. https:\/\/doi.org\/10.1109\/TCSII.2022.3145373","journal-title":"IEEE Trans Circuits Syst II Express Briefs"},{"key":"11451_CR108","doi-asserted-by":"publisher","first-page":"2300352","DOI":"10.1002\/aisy.202300352","volume":"6","author":"C Tao","year":"2023","unstructured":"Tao C, Li M, Cao F, Gao Z, Zhang Z (2023) A multiobjective collaborative deep reinforcement learning algorithm for jumping optimization of bipedal robot. Adv Intell Syst 6:2300352. https:\/\/doi.org\/10.1002\/aisy.202300352","journal-title":"Adv Intell Syst"},{"key":"11451_CR109","doi-asserted-by":"publisher","unstructured":"Taylor M, Bashkirov S, Rico JF, Toriyama I, Miyada N, Yanagisawa H, Ishizuka K (2021) Learning bipedal robot locomotion from human movement. In: IEEE International Conference on Robotics and Automation, pp 2797\u20132803. https:\/\/doi.org\/10.1109\/ICRA48506.2021.9561591","DOI":"10.1109\/ICRA48506.2021.9561591"},{"key":"11451_CR110","doi-asserted-by":"publisher","unstructured":"Tedrake R, Zhang TW, Seung HS (2004) Stochastic policy gradient reinforcement learning on a simple 3D biped. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 2849\u20132854. https:\/\/doi.org\/10.1109\/IROS.2004.1389841","DOI":"10.1109\/IROS.2004.1389841"},{"key":"11451_CR111","doi-asserted-by":"crossref","unstructured":"Todorov E, Erez T, Tassa Y (2012) MuJoCo: A physics engine for model-based control. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 5026\u20135033","DOI":"10.1109\/IROS.2012.6386109"},{"key":"11451_CR112","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1109\/JAS.2023.124140","volume":"11","author":"Y Tong","year":"2024","unstructured":"Tong Y, Liu H, Zhang Z (2024) Advancements in humanoid robots: A comprehensive review and future prospects. IEEE\/CAA J Autom Sin 11:301\u2013328","journal-title":"IEEE\/CAA J Autom Sin"},{"key":"11451_CR113","doi-asserted-by":"publisher","first-page":"1873","DOI":"10.3390\/s23041873","volume":"23","author":"S Wang","year":"2023","unstructured":"Wang S, Piao S, Leng X, He Z (2023) Learning 3D bipedal walking with planned footsteps and Fourier series periodic gait planning. Sensors 23:1873","journal-title":"Sensors"},{"key":"11451_CR114","doi-asserted-by":"publisher","unstructured":"Wensing PM, Orin DE (2013) High-speed humanoid running through control with a 3D-SLIP model. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 5134\u20135140. https:\/\/doi.org\/10.1109\/IROS.2013.6697099","DOI":"10.1109\/IROS.2013.6697099"},{"key":"11451_CR115","unstructured":"Wu P, Escontrela A, Hafner D, Abbeel P, Goldberg K (2023) DayDreamer: World models for physical robot learning. In: Conference on Robot Learning, pp. 2226\u20132240"},{"key":"11451_CR117","doi-asserted-by":"publisher","unstructured":"Xie Z, Berseth G, Clary P, Hurst J, van de Panne M (2018) Feedback control for cassie with deep reinforcement learning. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp. 1241\u20131246. https:\/\/doi.org\/10.1109\/IROS.2018.8593722","DOI":"10.1109\/IROS.2018.8593722"},{"key":"11451_CR116","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1111\/cgf.14115","volume":"39","author":"Z Xie","year":"2020","unstructured":"Xie Z, Ling H, Kim N, van de Panne M (2020a) ALLSTEPS: Curriculum-driven learning of stepping stone skills. Comput Graph Forum 39:213\u2013224. https:\/\/doi.org\/10.1111\/cgf.14115","journal-title":"Comput Graph Forum"},{"key":"11451_CR118","unstructured":"Xie Z, Clary P, Dao J, Morais P, Hurst J, van de Panne M (2020b) Learning locomotion skills for cassie: Iterative design and sim-to-real. In: Conference on Robot Learning, pp 317\u2013329"},{"key":"11451_CR120","doi-asserted-by":"crossref","unstructured":"Yang M, Yang E, Zante RC, Post M, Liu X (2019a) Collaborative mobile industrial manipulator: a review of system architecture and applications. In: International Conference on Automation and Computing, pp 1\u20136","DOI":"10.23919\/IConAC.2019.8895183"},{"key":"11451_CR121","doi-asserted-by":"publisher","unstructured":"Yang C, Yuan K, Merkt W, Komura T, Vijayakumar S, Li Z (2019b) Learning whole-body motor skills for humanoids. In: IEEE-RAS International Conference on Humanoid Robots, pp 270\u2013276. https:\/\/doi.org\/10.1109\/HUMANOIDS.2018.8625045","DOI":"10.1109\/HUMANOIDS.2018.8625045"},{"key":"11451_CR119","doi-asserted-by":"publisher","first-page":"2610","DOI":"10.1109\/LRA.2020.2972879","volume":"5","author":"C Yang","year":"2020","unstructured":"Yang C, Yuan K, Heng S, Komura T, Li Z (2020) Learning natural locomotion behaviors for humanoid robots using human bias. IEEE Robot Autom Lett 5:2610\u20132617. https:\/\/doi.org\/10.1109\/LRA.2020.2972879","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR122","doi-asserted-by":"publisher","unstructured":"Yao Y, He W, Gu C, Du J, Tan F, Zhu Z, Lu J (2024) AnyBipe: An end-to-end framework for training and deploying bipedal robots guided by large language models [cs.RO] https:\/\/doi.org\/10.48550\/arXiv.2409.08904","DOI":"10.48550\/arXiv.2409.08904"},{"key":"11451_CR123","doi-asserted-by":"publisher","unstructured":"Yin F, Tang A, Xu L, Cao Y, Zheng Y, Zhang Z, Chen X (2021) Run like a dog: Learning based whole-body control framework for quadruped gait style transfer. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp. 8508\u20138514. https:\/\/doi.org\/10.1109\/IROS51168.2021.9636805","DOI":"10.1109\/IROS51168.2021.9636805"},{"key":"11451_CR124","doi-asserted-by":"publisher","first-page":"10312","DOI":"10.1109\/LRA.2022.3191071","volume":"7","author":"C Yu","year":"2022","unstructured":"Yu C, Rosendo A (2022) Multi-modal legged locomotion framework with automated residual reinforcement learning. IEEE Robot Autom Lett 7:10312\u201310319. https:\/\/doi.org\/10.1109\/LRA.2022.3191071","journal-title":"IEEE Robot Autom Lett"},{"key":"11451_CR125","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3197517.3201397","volume":"37","author":"W Yu","year":"2018","unstructured":"Yu W, Turk G, Liu CK (2018) Learning symmetric and low-energy locomotion. ACM Trans Graph 37:1\u201312. https:\/\/doi.org\/10.1145\/3197517.3201397","journal-title":"ACM Trans Graph"},{"key":"11451_CR126","doi-asserted-by":"crossref","unstructured":"Yu W, Kumar VCV, Turk G, Liu CK (2019) Sim-to-real transfer for biped locomotion. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp 3503\u20133510","DOI":"10.1109\/IROS40897.2019.8968053"},{"key":"11451_CR127","doi-asserted-by":"crossref","unstructured":"Zhang Q, Cui P, Yan D, Sun J, Duan Y, Zhang A, Xu R (2024) Whole-body humanoid robot locomotion with human reference. arXiv preprint arXiv:2402.18294","DOI":"10.1109\/IROS58592.2024.10801451"},{"key":"11451_CR128","doi-asserted-by":"publisher","first-page":"4962","DOI":"10.1109\/TIE.2022.3190850","volume":"70","author":"W Zhu","year":"2023","unstructured":"Zhu W, Hayashibe M (2023) A hierarchical deep reinforcement learning framework with high efficiency and generalization for fast and safe navigation. IEEE Trans Ind Electron 70:4962\u20134971. https:\/\/doi.org\/10.1109\/TIE.2022.3190850","journal-title":"IEEE Trans Ind Electron"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-025-11451-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11451-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11451-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T03:10:18Z","timestamp":1769483418000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-025-11451-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,27]]},"references-count":128,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["11451"],"URL":"https:\/\/doi.org\/10.1007\/s10462-025-11451-z","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,27]]},"assertion":[{"value":"25 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 December 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"38"}}