{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T17:29:30Z","timestamp":1778347770633,"version":"3.51.4"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,7,11]],"date-time":"2016-07-11T00:00:00Z","timestamp":1468195200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000038","name":"NSERC","doi-asserted-by":"crossref","award":["RGPIN-2015-04843"],"award-info":[{"award-number":["RGPIN-2015-04843"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2016,7,11]]},"abstract":"<jats:p>Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions. MACE learns more quickly than a single actor-critic approach and results in actor-critic experts that exhibit specialization. Additional elements of our solution that contribute towards efficient learning include Boltzmann exploration and the use of initial actor biases to encourage specialization. Results are demonstrated for multiple planar characters and terrain classes.<\/jats:p>","DOI":"10.1145\/2897824.2925881","type":"journal-article","created":{"date-parts":[[2016,7,11]],"date-time":"2016-07-11T16:04:33Z","timestamp":1468253073000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":133,"title":["Terrain-adaptive locomotion skills using deep reinforcement learning"],"prefix":"10.1145","volume":"35","author":[{"given":"Xue Bin","family":"Peng","sequence":"first","affiliation":[{"name":"University of British Columbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Glen","family":"Berseth","sequence":"additional","affiliation":[{"name":"University of British Columbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michiel","family":"van de Panne","sequence":"additional","affiliation":[{"name":"University of British Columbia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,7,11]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Assael J.-A. M. Wahlstr\u00f6m N. Sch\u00f6n T. B. and Deisenroth M. P. 2015. Data-efficient learning of feedback policies from image pixels using deep dynamical models. arXiv preprint arXiv:1510.02173.  Assael J.-A. M. Wahlstr\u00f6m N. Sch\u00f6n T. B. and Deisenroth M. P. 2015. Data-efficient learning of feedback policies from image pixels using deep dynamical models. arXiv preprint arXiv:1510.02173.","DOI":"10.1016\/j.ifacol.2015.12.271"},{"key":"e_1_2_2_2_1","unstructured":"Bullet 2015. Bullet physics library Dec. http:\/\/bulletphysics.org.  Bullet 2015. Bullet physics library Dec. http:\/\/bulletphysics.org."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2012.09.012"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409060.1409066"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618516"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781156"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964954"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360681"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531388"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976602753712972"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/383259.383287"},{"key":"e_1_2_2_12_1","doi-asserted-by":"crossref","volume-title":"Rigid body dynamics algorithms","author":"Featherstone R.","DOI":"10.1007\/978-1-4899-7560-7"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2012.03189.x"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280816"},{"key":"e_1_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Hansen N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation 75--102.  Hansen N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation 75--102.","DOI":"10.1007\/3-540-32494-1_4"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976601750541778"},{"key":"e_1_2_2_17_1","unstructured":"Hausknecht M. and Stone P. 2015. Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143.  Hausknecht M. and Stone P. 2015. Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143."},{"key":"e_1_2_2_18_1","unstructured":"Heess N. Wayne G. Silver D. Lillicrap T. Erez T. and Tassa Y. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems 2926--2934.   Heess N. Wayne G. Silver D. Lillicrap T. Erez T. and Tassa Y. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems 2926--2934."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-012-5322-7"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/218380.218414"},{"key":"e_1_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Jacobs R. A. Jordan M. I. Nowlan S. J. and Hinton G. E. 1991. Adaptive mixtures of local experts. Neural computation 3 1 79--87.  Jacobs R. A. Jordan M. I. Nowlan S. J. and Hinton G. E. 1991. Adaptive mixtures of local experts. Neural computation 3 1 79--87.","DOI":"10.1162\/neco.1991.3.1.79"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/237170.237231"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.gmod.2005.03.004"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618515"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882261.1866160"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781155"},{"key":"e_1_2_2_28_1","unstructured":"Levine S. and Abbeel P. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems 27. 1071--1079.   Levine S. and Abbeel P. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems 27 . 1071--1079."},{"key":"e_1_2_2_29_1","volume-title":"Proceedings of the 31st International Conference on Machine Learning (ICML-14)","author":"Levine S."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185524"},{"key":"e_1_2_2_31_1","unstructured":"Levine S. Finn C. Darrell T. and Abbeel P. 2015. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702.   Levine S. Finn C. Darrell T. and Abbeel P. 2015. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702."},{"key":"e_1_2_2_32_1","unstructured":"Lillicrap T. P. Hunt J. J. Pritzel A. Heess N. Erez T. Tassa Y. Silver D. and Wierstra D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.  Lillicrap T. P. Hunt J. J. Pritzel A. Heess N. Erez T. Tassa Y. Silver D. and Wierstra D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366173"},{"key":"e_1_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Mnih V. Kavukcuoglu K. Silver D. Rusu A. A. Veness J. Bellemare M. G. Graves A. Riedmiller M. Fidjeland A. K. Ostrovski G. etal 2015. Human-level control through deep reinforcement learning. Nature 518 7540 529--533.  Mnih V. Kavukcuoglu K. Silver D. Rusu A. A. Veness J. Bellemare M. G. Graves A. Riedmiller M. Fidjeland A. K. Ostrovski G. et al. 2015. Human-level control through deep reinforcement learning. Nature 518 7540 529--533.","DOI":"10.1038\/nature14236"},{"key":"e_1_2_2_35_1","volume-title":"Robotics: Science and Systems (RSS).","author":"Mordatch I.","year":"2014"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778808"},{"key":"e_1_2_2_37_1","unstructured":"Mordatch I. Lowrey K. Andrew G. Popovic Z. and Todorov E. V. 2015. Interactive control of diverse complex characters with neural networks. In Advances in Neural Information Processing Systems 3114--3122.   Mordatch I. Lowrey K. Andrew G. Popovic Z. and Todorov E. V. 2015. Interactive control of diverse complex characters with neural networks. In Advances in Neural Information Processing Systems 3114--3122."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531387"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1966394.1966395"},{"key":"e_1_2_2_40_1","unstructured":"Nair A. Srinivasan P. Blackwell S. Alcicek C. Fearon R. De Maria A. Panneershelvam V. Suley-man M. Beattie C. Petersen S. et al. 2015. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296.  Nair A. Srinivasan P. Blackwell S. Alcicek C. Fearon R. De Maria A. Panneershelvam V. Suley-man M. Beattie C. Petersen S. et al. 2015. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 ."},{"key":"e_1_2_2_41_1","volume-title":"Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342.","author":"Parisotto E.","year":"2015"},{"key":"e_1_2_2_42_1","volume-title":"Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on, IEEE, 309--315","author":"Pastor P."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766910"},{"key":"e_1_2_2_44_1","unstructured":"Rusu A. A. Colmenarejo S. G. Gulcehre C. Desjardins G. Kirkpatrick J. Pascanu R. Mnih V. Kavukcuoglu K. and Hadsell R. 2015. Policy distillation. arXiv preprint arXiv:1511.06295.  Rusu A. A. Colmenarejo S. G. Gulcehre C. Desjardins G. Kirkpatrick J. Pascanu R. Mnih V. Kavukcuoglu K. and Hadsell R. 2015. Policy distillation. arXiv preprint arXiv:1511.06295."},{"key":"e_1_2_2_45_1","unstructured":"Schaul T. Quan J. Antonoglou I. and Silver D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.  Schaul T. Quan J. Antonoglou I. and Silver D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952."},{"key":"e_1_2_2_46_1","unstructured":"Schulman J. Levine S. Moritz P. Jordan M. I. and Abbeel P. 2015. Trust region policy optimization. CoRR abs\/1502.05477.  Schulman J. Levine S. Moritz P. Jordan M. I. and Abbeel P. 2015. Trust region policy optimization. CoRR abs\/1502.05477."},{"key":"e_1_2_2_47_1","unstructured":"Silver D. Lever G. Heess N. Degris T. Wierstra D. and Riedmiller M. 2014. Deterministic policy gradient algorithms. In ICML.  Silver D. Lever G. Heess N. Degris T. Wierstra D. and Riedmiller M. 2014. Deterministic policy gradient algorithms. In ICML."},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276511"},{"key":"e_1_2_2_49_1","unstructured":"Stadie B. C. Levine S. and Abbeel P. 2015. Incentiviz-ing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814.  Stadie B. C. Levine S. and Abbeel P. 2015. Incentiviz-ing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2011.30"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601121"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276386"},{"key":"e_1_2_2_53_1","volume-title":"Proc. of International Conference on Simulation of Adaptive Behavior: From Animals and Animats, 287--296","author":"Uchibe E."},{"key":"e_1_2_2_54_1","first-page":"2579","article-title":"Visualizing high-dimensional data using t-sne","volume":"9","author":"van der Maaten L.","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_2_55_1","volume-title":"Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on, IEEE, 272--279","author":"Van Hasselt H."},{"key":"e_1_2_2_56_1","doi-asserted-by":"crossref","unstructured":"Van Hasselt H. Guez A. and Silver D. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461.  Van Hasselt H. Guez A. and Silver D. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"e_1_2_2_57_1","volume-title":"Reinforcement Learning","author":"Van Hasselt H."},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618514"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2008.920231"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778811"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276509"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360680"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2897824.2925881","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2897824.2925881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:38:46Z","timestamp":1750221526000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2897824.2925881"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,11]]},"references-count":62,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,7,11]]}},"alternative-id":["10.1145\/2897824.2925881"],"URL":"https:\/\/doi.org\/10.1145\/2897824.2925881","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,7,11]]},"assertion":[{"value":"2016-07-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}