{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T03:55:00Z","timestamp":1781063700785,"version":"3.54.1"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,7,20]],"date-time":"2017-07-20T00:00:00Z","timestamp":1500508800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2017,8,31]]},"abstract":"<jats:p>Learning physics-based locomotion skills is a difficult problem, leading to solutions that typically exploit prior knowledge of various forms. In this paper we aim to learn a variety of environment-aware locomotion skills with a limited amount of prior knowledge. We adopt a two-level hierarchical control framework. First, low-level controllers are learned that operate at a fine timescale and which achieve robust walking gaits that satisfy stepping-target and style objectives. Second, high-level controllers are then learned which plan at the timescale of steps by invoking desired step targets for the low-level controller. The high-level controller makes decisions directly based on high-dimensional inputs, including terrain maps or other suitable representations of the surroundings. Both levels of the control policy are trained using deep reinforcement learning. Results are demonstrated on a simulated 3D biped. Low-level controllers are learned for a variety of motion styles and demonstrate robustness with respect to force-based disturbances, terrain variations, and style interpolation. High-level controllers are demonstrated that are capable of following trails through terrains, dribbling a soccer ball towards a target location, and navigating through static or dynamic obstacles.<\/jats:p>","DOI":"10.1145\/3072959.3073602","type":"journal-article","created":{"date-parts":[[2017,7,21]],"date-time":"2017-07-21T12:24:07Z","timestamp":1500639847000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":384,"title":["DeepLoco"],"prefix":"10.1145","volume":"36","author":[{"given":"Xue Bin","family":"Peng","sequence":"first","affiliation":[{"name":"University of British Columbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Glen","family":"Berseth","sequence":"additional","affiliation":[{"name":"University of British Columbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kangkang","family":"Yin","sequence":"additional","affiliation":[{"name":"National University of Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michiel","family":"Van De Panne","sequence":"additional","affiliation":[{"name":"University of British Columbia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2017,7,20]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2012.325"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366175"},{"key":"e_1_2_2_3_1","unstructured":"Bullet. 2015. Bullet Physics Library. (2015). http:\/\/bulletphysics.org. Bullet. 2015. Bullet Physics Library. (2015). http:\/\/bulletphysics.org."},{"key":"e_1_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Joel Chestnutt Manfred Lau German Cheung James Kuffner Jessica Hodgins and Takeo Kanade. 2005. Footstep Planning for the Honda ASIMO Humanoid. In ICRA05. 629--634. Joel Chestnutt Manfred Lau German Cheung James Kuffner Jessica Hodgins and Takeo Kanade. 2005. Footstep Planning for the Honda ASIMO Humanoid. In ICRA05. 629--634.","DOI":"10.1109\/ROBOT.2005.1570188"},{"key":"e_1_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Stelian Coros Philippe Beaudoin and Michiel van de Panne. 2009. Robust task-based control policies for physics-based characters. ACM Transctions on Graphics 28 5 (2009) Article 170. Stelian Coros Philippe Beaudoin and Michiel van de Panne. 2009. Robust task-based control policies for physics-based characters. ACM Transctions on Graphics 28 5 (2009) Article 170.","DOI":"10.1145\/1618452.1618516"},{"key":"e_1_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Stelian Coros Philippe Beaudoin and Michiel van de Panne. 2010. Generalized Biped Walking Control. ACM Transctions on Graphics 29 4 (2010) Article 130. Stelian Coros Philippe Beaudoin and Michiel van de Panne. 2010. Generalized Biped Walking Control. ACM Transctions on Graphics 29 4 (2010) Article 130.","DOI":"10.1145\/1778765.1781156"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409060.1409066"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964954"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360681"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1833349.1781157"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. CoRR abs\/1606.01847 (2016). http:\/\/arxiv.org\/abs\/1606.01847 Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. CoRR abs\/1606.01847 (2016). http:\/\/arxiv.org\/abs\/1606.01847","DOI":"10.18653\/v1\/D16-1044"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2012.03189.x"},{"key":"e_1_2_2_13_1","unstructured":"Michael X. Grey Aaron D. Ames and C. Karen Liu. 2016. Footstep and Motion Planning in Semi-unstructured Environments Using Possibility Graphs. CoRR abs\/1610.00700 (2016). http:\/\/arxiv.org\/abs\/1610.00700 Michael X. Grey Aaron D. Ames and C. Karen Liu. 2016. Footstep and Motion Planning in Semi-unstructured Environments Using Possibility Graphs. CoRR abs\/1610.00700 (2016). http:\/\/arxiv.org\/abs\/1610.00700"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280816"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2767002"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-32494-1_4"},{"key":"e_1_2_2_17_1","unstructured":"Nicolas Heess Gregory Wayne Yuval Tassa Timothy P. Lillicrap Martin A. Riedmiller and David Silver. 2016. Learning and Transfer of Modulated Locomotor Controllers. CoRR abs\/1610.05182 (2016). http:\/\/arxiv.org\/abs\/1610.05182 Nicolas Heess Gregory Wayne Yuval Tassa Timothy P. Lillicrap Martin A. Riedmiller and David Silver. 2016. Learning and Transfer of Modulated Locomotor Controllers. CoRR abs\/1610.05182 (2016). http:\/\/arxiv.org\/abs\/1610.05182"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/218380.218414"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/70.508439"},{"key":"e_1_2_2_21_1","doi-asserted-by":"crossref","unstructured":"James Kuffner Koichi Nishiwaki Satoshi Kagami Masayuki Inaba and Hirochika Inoue. 2005. Motion Planning for Humanoid Robots. Springer Berlin Heidelberg 365--374. James Kuffner Koichi Nishiwaki Satoshi Kagami Masayuki Inaba and Hirochika Inoue. 2005. Motion Planning for Humanoid Robots. Springer Berlin Heidelberg 365--374.","DOI":"10.1007\/11008941_39"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073368.1073408"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028523.1028535"},{"key":"e_1_2_2_24_1","unstructured":"Yoonsang Lee Sungeun Kim and Jehee Lee. 2010. Data-Driven Biped Control. ACM Transctions on Graphics 29 4 (2010) Article 129. Yoonsang Lee Sungeun Kim and Jehee Lee. 2010. Data-Driven Biped Control. ACM Transctions on Graphics 29 4 (2010) Article 129."},{"key":"e_1_2_2_25_1","volume-title":"Advances in Neural Information Processing Systems 27","author":"Levine Sergey"},{"key":"e_1_2_2_26_1","volume-title":"Guided Policy Search. In ICML '13: Proceedings of the 30th International Conference on Machine Learning.","author":"Levine Sergey","year":"2013"},{"key":"e_1_2_2_27_1","volume-title":"Proceedings of the 31st International Conference on Machine Learning (ICML-14)","author":"Levine Sergey","year":"2014"},{"key":"e_1_2_2_28_1","unstructured":"Timothy P Lillicrap Jonathan J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015). Timothy P Lillicrap Jonathan J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2893476"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366173"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1576246.1531386"},{"key":"e_1_2_2_32_1","unstructured":"Volodymyr Mnih Adri\u00e0 Puigdom\u00e8nech Badia Mehdi Mirza Alex Graves Timothy P. Lillicrap Tim Harley David Silver and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. CoRR abs\/1602.01783 (2016). http:\/\/arxiv.org\/abs\/1602.01783 Volodymyr Mnih Adri\u00e0 Puigdom\u00e8nech Badia Mehdi Mirza Alex Graves Timothy P. Lillicrap Tim Harley David Silver and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. CoRR abs\/1602.01783 (2016). http:\/\/arxiv.org\/abs\/1602.01783"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778808"},{"key":"e_1_2_2_34_1","unstructured":"Igor Mordatch Kendall Lowrey Galen Andrew Zoran Popovic and Emanuel V Todorov. 2015. Interactive Control of Diverse Complex Characters with Neural Networks. In Advances in Neural Information Processing Systems. 3114--3122. Igor Mordatch Kendall Lowrey Galen Andrew Zoran Popovic and Emanuel V Todorov. 2015. Interactive Control of Diverse Complex Characters with Neural Networks. In Advances in Neural Information Processing Systems. 3114--3122."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2014.X.052"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531387"},{"key":"e_1_2_2_37_1","volume-title":"Proceedings of the 27th International Conference on Machine Learning (ICML-10)","author":"Nair Vinod","year":"2010"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766910"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925881"},{"key":"e_1_2_2_40_1","unstructured":"Xue Bin Peng and Michiel van de Panne. 2016. Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter? CoRR abs\/1611.01055 (2016). http:\/\/arxiv.org\/abs\/1611.01055 Xue Bin Peng and Michiel van de Panne. 2016. Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter? CoRR abs\/1611.01055 (2016). http:\/\/arxiv.org\/abs\/1611.01055"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/566570.566636"},{"key":"e_1_2_2_42_1","volume-title":"SCA '03: Proceedings of the 2010 ACM SIGGRAPH\/Eurographics symposium on Computer animation. 258--264","author":"Pettr\u00c3l' Julien","year":"2003"},{"key":"e_1_2_2_43_1","unstructured":"John Schulman Sergey Levine Philipp Moritz Michael I. Jordan and Pieter Abbeel. 2015. Trust Region Policy Optimization. CoRR abs\/1502.05477 (2015). http:\/\/arxiv.org\/abs\/1502.05477 John Schulman Sergey Levine Philipp Moritz Michael I. Jordan and Pieter Abbeel. 2015. Trust Region Policy Optimization. CoRR abs\/1502.05477 (2015). http:\/\/arxiv.org\/abs\/1502.05477"},{"key":"e_1_2_2_44_1","volume-title":"International Conference on Learning Representations (ICLR","author":"Schulman John","year":"2016"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276511"},{"key":"e_1_2_2_46_1","volume-title":"Advances in Neural Information Processing Systems 12","author":"Sutton Richard S."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601121"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2011.30"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386025"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27645-3_7"},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Jack M. Wang David J. Fleet and Aaron Hertzmann. 2009. Optimizing Walking Controllers. ACM Transctions on Graphics 28 5 (2009) Article 168. Jack M. Wang David J. Fleet and Aaron Hertzmann. 2009. Optimizing Walking Controllers. ACM Transctions on Graphics 28 5 (2009) Article 168.","DOI":"10.1145\/1618452.1618514"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509226"},{"key":"e_1_2_2_53_1","first-page":"4","article-title":"Terrain-adaptive bipedal locomotion control","volume":"29","author":"Zoran Popovi\u0107 Wu","year":"2010","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015706.1015756"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778811"},{"key":"e_1_2_2_56_1","doi-asserted-by":"crossref","unstructured":"KangKang Yin Stelian Coros Philippe Beaudoin and Michiel van de Panne. 2008. Continuation Methods for Adapting Simulated Skills. ACM Transctions on Graphics 27 3 (2008) Article 81. KangKang Yin Stelian Coros Philippe Beaudoin and Michiel van de Panne. 2008. Continuation Methods for Adapting Simulated Skills. ACM Transctions on Graphics 27 3 (2008) Article 81.","DOI":"10.1145\/1360612.1360680"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1275808.1276509"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2015.7140083"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3072959.3073602","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3072959.3073602","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T17:07:54Z","timestamp":1750784874000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3072959.3073602"}},"subtitle":["dynamic locomotion skills using hierarchical deep reinforcement learning"],"short-title":[],"issued":{"date-parts":[[2017,7,20]]},"references-count":58,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,8,31]]}},"alternative-id":["10.1145\/3072959.3073602"],"URL":"https:\/\/doi.org\/10.1145\/3072959.3073602","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,7,20]]},"assertion":[{"value":"2017-07-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}