{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T17:54:20Z","timestamp":1774720460015,"version":"3.50.1"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["594644"],"award-info":[{"award-number":["594644"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Comput. Graph. Interact. Tech."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>Locomotion is fundamental to the repertoire of skills required of physics-based human-like characters. Control policies are most commonly developed using reinforcement learning (RL) and using reward functions based on imitation of motion capture data. In this work, we propose an imitation-free RL training pipeline for bipedal locomotion controllers, as achieved using a multistage learning curriculum. Our work makes several contributions. First, it introduces a minimal set of additional specifications so that imitation-free RL can learn a single policy capable of in-place turning, side-stepping, hopping, and one-step foot plants, in addition to forwards and backwards walking. Second, the method offers precise and flexible conditioning, with control over footstep locations and further optional control over footstep timing, and footstep orientation. Third, we demonstrate that this imitation-free RL pipeline works across a range of body morphologies. Last, we show that the use of a plasticity-preservation technique allows for significantly faster learning. Our results demonstrate the scalability and effectiveness of using imitation-free RL approaches to develop flexible and highly-directable locomotion policies.<\/jats:p>","DOI":"10.1145\/3747865","type":"journal-article","created":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T15:33:31Z","timestamp":1754667211000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Walk This Way: Imitation-free Reinforcement Learning of Flexibly-Constrained Walking Controllers 60"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4121-1061","authenticated-orcid":false,"given":"Tiffany","family":"Matth\u00e9","sequence":"first","affiliation":[{"name":"University of British Columbia","place":["Vancouver, Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4172-6151","authenticated-orcid":false,"given":"Nicholas","family":"Ioannidis","sequence":"additional","affiliation":[{"name":"University of British Columbia","place":["Vancouver, Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9123-3672","authenticated-orcid":false,"given":"Michiel","family":"van de Panne","sequence":"additional","affiliation":[{"name":"University of British Columbia","place":["Vancouver, Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,8]]},"reference":[{"key":"e_1_3_2_2_1","unstructured":"Lingfan Bao Joseph Humphreys Tianhu Peng and Chengxu Zhou. 2024. Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.17070 (2024)."},{"key":"e_1_3_2_3_1","doi-asserted-by":"crossref","unstructured":"Kevin Bergamin Simon Clavet Daniel Holden and James\u00a0Richard Forbes. 2019. DReCon: data-driven responsive control of physics-based characters. ACM Transactions On Graphics (TOG) 38 6 (2019) 1\u201311.","DOI":"10.1145\/3355089.3356536"},{"key":"e_1_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00611"},{"key":"e_1_3_2_5_1","unstructured":"Michal Ciebielski and Majid Khadiv. 2024. Contact-conditioned learning of locomotion policies. arxiv:https:\/\/arXiv.org\/abs\/2408.00776\u00a0[cs.RO] https:\/\/arxiv.org\/abs\/2408.00776"},{"key":"e_1_3_2_6_1","doi-asserted-by":"publisher","unstructured":"Shibhansh Dohare J.\u00a0Fernando Hernandez-Garcia Qingfeng Lan Parash Rahman A.\u00a0Rupam Mahmood and Richard\u00a0S. Sutton. 2024. Loss of plasticity in deep continual learning. Nature 632 8026 (2024) 768\u2013774. 10.1038\/s41586-024-07711-7","DOI":"10.1038\/s41586-024-07711-7"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9812015"},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981884"},{"key":"e_1_3_2_9_1","unstructured":"Pranay Dugar Aayam Shrestha Fangzhou Yu Bart van Marum and Alan Fern. 2024. Learning multi-modal whole-body control for real-world humanoid robots. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.07295 (2024)."},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14636"},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160562"},{"key":"e_1_3_2_12_1","doi-asserted-by":"crossref","unstructured":"Thomas Geijtenbeek Michiel Van De\u00a0Panne and A\u00a0Frank Van Der\u00a0Stappen. 2013. Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics (TOG) 32 6 (2013) 1\u201311.","DOI":"10.1145\/2508363.2508399"},{"key":"e_1_3_2_13_1","unstructured":"Sehoon Ha Joonho Lee Michiel van\u00a0de Panne Zhaoming Xie Wenhao Yu and Majid Khadiv. 2024. Learning-based legged locomotion; state of the art and future perspectives. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.01152 (2024)."},{"key":"e_1_3_2_14_1","unstructured":"Ben Kenwright. 2022. Watch your step: Real-time adaptive character stepping. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.14730 (2022)."},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14504"},{"key":"e_1_3_2_16_1","doi-asserted-by":"crossref","unstructured":"Taesoo Kwon Yoonsang Lee and Michiel Van De\u00a0Panne. 2020. Fast and flexible multilegged locomotion using learned centroidal dynamics. ACM Transactions on Graphics (TOG) 39 4 (2020) 46\u20131.","DOI":"10.1145\/3386569.3392432"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS58592.2024.10801468"},{"key":"e_1_3_2_18_1","doi-asserted-by":"crossref","unstructured":"Yoonsang Lee Moon\u00a0Seok Park Taesoo Kwon and Jehee Lee. 2014. Locomotion control for many-muscle humanoids. ACM Transactions on Graphics (TOG) 33 6 (2014) 1\u201311.","DOI":"10.1145\/2661229.2661233"},{"key":"e_1_3_2_19_1","unstructured":"Zhongyu Li Xue\u00a0Bin Peng Pieter Abbeel Sergey Levine Glen Berseth and Koushil Sreenath. 2024. Reinforcement learning for versatile dynamic and robust bipedal locomotion control. The International Journal of Robotics Research (2024) 02783649241285161."},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01000"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1576246.1531386"},{"key":"e_1_3_2_22_1","doi-asserted-by":"crossref","unstructured":"Gangrae Park Jaepyung Hwang and Taesoo Kwon. 2025. Sample-efficient reference-free control strategy for multi-legged locomotion. Computers & Graphics 126 (2025) 104141.","DOI":"10.1016\/j.cag.2024.104141"},{"key":"e_1_3_2_23_1","doi-asserted-by":"crossref","unstructured":"Soohwan Park Hoseok Ryu Seyoung Lee Sunmin Lee and Jehee Lee. 2019. Learning predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics (TOG) 38 6 (2019) 1\u201311.","DOI":"10.1145\/3355089.3356501"},{"key":"e_1_3_2_24_1","doi-asserted-by":"crossref","unstructured":"Xue\u00a0Bin Peng Pieter Abbeel Sergey Levine and Michiel Van\u00a0de Panne. 2018. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG) 37 4 (2018) 1\u201314.","DOI":"10.1145\/3197517.3201311"},{"key":"e_1_3_2_25_1","doi-asserted-by":"crossref","unstructured":"Xue\u00a0Bin Peng Glen Berseth KangKang Yin and Michiel Van De\u00a0Panne. 2017. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. Acm transactions on graphics (tog) 36 4 (2017) 1\u201313.","DOI":"10.1145\/3072959.3073602"},{"key":"e_1_3_2_26_1","doi-asserted-by":"crossref","unstructured":"Xue\u00a0Bin Peng Ze Ma Pieter Abbeel Sergey Levine and Angjoo Kanazawa. 2021. Amp: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics (ToG) 40 4 (2021) 1\u201320.","DOI":"10.1145\/3450626.3459670"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01322"},{"key":"e_1_3_2_28_1","first-page":"627","volume-title":"Proceedings of the fourteenth international conference on artificial intelligence and statistics","author":"Ross St\u00e9phane","year":"2011","unstructured":"St\u00e9phane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 627\u2013635."},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/Humanoids53995.2022.10000067"},{"key":"e_1_3_2_30_1","unstructured":"Ghada Sokar Rishabh Agarwal Pablo\u00a0Samuel Castro and Utku Evci. 2023. The Dormant Neuron Phenomenon in Deep Reinforcement Learning. arxiv:https:\/\/arXiv.org\/abs\/2302.12902\u00a0[cs.LG] https:\/\/arxiv.org\/abs\/2302.12902"},{"key":"e_1_3_2_31_1","doi-asserted-by":"crossref","unstructured":"Seungmoon Song \u0141ukasz Kidzi\u0144ski Xue\u00a0Bin Peng Carmichael Ong Jennifer Hicks Sergey Levine Christopher\u00a0G Atkeson and Scott\u00a0L Delp. 2021. Deep reinforcement learning for modeling human locomotion control in neuromechanical simulation. Journal of neuroengineering and rehabilitation 18 (2021) 1\u201317.","DOI":"10.1186\/s12984-021-00919-y"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530697"},{"key":"e_1_3_2_33_1","doi-asserted-by":"crossref","unstructured":"Chen Tessler Yunrong Guo Ofir Nabati Gal Chechik and Xue\u00a0Bin Peng. 2024. MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting. ACM Transactions on Graphics (TOG) (2024).","DOI":"10.1145\/3687951"},{"key":"e_1_3_2_34_1","unstructured":"Guy Tevet Sigal Raab Setareh Cohan Daniele Reda Zhengyi Luo Xue\u00a0Bin Peng Amit\u00a0H Bermano and Michiel van\u00a0de Panne. 2024. CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.03441 (2024)."},{"key":"e_1_3_2_35_1","first-page":"9","volume-title":"Graphics Interface","author":"Basten Ben\u00a0JH van","year":"2011","unstructured":"Ben\u00a0JH van Basten, Sybren\u00a0A St\u00fcvel, and Arjan Egges. 2011. A hybrid interpolation scheme for footprint-driven walking synthesis.. In Graphics Interface. 9\u201316."},{"key":"e_1_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.00181"},{"key":"e_1_3_2_37_1","doi-asserted-by":"crossref","unstructured":"Jack\u00a0M Wang Samuel\u00a0R Hamner Scott\u00a0L Delp and Vladlen Koltun. 2012. Optimizing locomotion controllers using biologically-based actuators and objectives. ACM Transactions on Graphics (TOG) 31 4 (2012) 1\u201311.","DOI":"10.1145\/2185520.2335376"},{"key":"e_1_3_2_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/1921427.1921445"},{"key":"e_1_3_2_39_1","unstructured":"Jinze Wu Guiyang Xin Chenkun Qi and Yufei Xue. 2023. Learning robust and agile legged locomotion using adversarial motion priors. IEEE Robotics and Automation Letters (2023)."},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","unstructured":"Kaixiang Xie Pei Xu Shedon Andrews Victor\u00a0B. Zordan and Paul\u00a0G. Kry. 2023. Too Stiff Too Strong Too Smart: Evaluating Fundamental Problems with Motion Control Policies. Proc. of the ACM on Computer Graphics and Interactive Techniques 6 3 (2023) 17\u00a0pages. 10.1145\/3606935","DOI":"10.1145\/3606935"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14115"},{"key":"e_1_3_2_42_1","doi-asserted-by":"crossref","unstructured":"Wenhao Yu Greg Turk and C\u00a0Karen Liu. 2018. Learning symmetric and low-energy locomotion. ACM Transactions on Graphics (TOG) 37 4 (2018) 1\u201312.","DOI":"10.1145\/3197517.3201397"}],"container-title":["Proceedings of the ACM on Computer Graphics and Interactive Techniques"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3747865","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T16:25:30Z","timestamp":1754670330000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3747865"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,8]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3747865"],"URL":"https:\/\/doi.org\/10.1145\/3747865","relation":{},"ISSN":["2577-6193"],"issn-type":[{"value":"2577-6193","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,8]]},"assertion":[{"value":"2025-08-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}