{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,31]],"date-time":"2026-07-31T15:47:36Z","timestamp":1785512856167,"version":"3.56.0"},"reference-count":112,"publisher":"American Association for the Advancement of Science (AAAS)","issue":"89","content-domain":{"domain":["www.science.org"],"crossmark-restriction":true},"short-container-title":["Sci. Robot."],"published-print":{"date-parts":[[2024,4,10]]},"abstract":"<jats:p>We investigated whether deep reinforcement learning (deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies. We used deep RL to train a humanoid robot to play a simplified one-versus-one soccer game. The resulting agent exhibits robust and dynamic movement skills, such as rapid fall recovery, walking, turning, and kicking, and it transitions between them in a smooth and efficient manner. It also learned to anticipate ball movements and block opponent shots. The agent\u2019s tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. Our agent was trained in simulation and transferred to real robots zero-shot. A combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training enabled good-quality transfer. In experiments, the agent walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline.<\/jats:p>","DOI":"10.1126\/scirobotics.adi8022","type":"journal-article","created":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T17:58:11Z","timestamp":1712771891000},"update-policy":"https:\/\/doi.org\/10.34133\/aaas_crossmark","source":"Crossref","is-referenced-by-count":155,"title":["Learning agile soccer skills for a bipedal robot with deep reinforcement learning"],"prefix":"10.1126","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-2973-9246","authenticated-orcid":true,"given":"Tuomas","family":"Haarnoja","sequence":"first","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9254-662X","authenticated-orcid":true,"given":"Ben","family":"Moran","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9551-1839","authenticated-orcid":true,"given":"Guy","family":"Lever","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8920-2247","authenticated-orcid":true,"given":"Sandy H.","family":"Huang","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dhruva","family":"Tirumala","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."},{"name":"University College London, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jan","family":"Humplik","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1802-4492","authenticated-orcid":true,"given":"Markus","family":"Wulfmeier","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1620-6797","authenticated-orcid":true,"given":"Saran","family":"Tunyasuvunakool","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5746-117X","authenticated-orcid":true,"given":"Noah Y.","family":"Siegel","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8061-8828","authenticated-orcid":true,"given":"Roland","family":"Hafner","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2171-696X","authenticated-orcid":true,"given":"Michael","family":"Bloesch","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kristian","family":"Hartikainen","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Arunkumar","family":"Byravan","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1844-696X","authenticated-orcid":true,"given":"Leonard","family":"Hasenclever","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1197-288X","authenticated-orcid":true,"given":"Yuval","family":"Tassa","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fereshteh","family":"Sadeghi","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nathan","family":"Batchelor","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Federico","family":"Casarini","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefano","family":"Saliceti","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6402-3014","authenticated-orcid":true,"given":"Charles","family":"Game","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Neil","family":"Sreendra","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."},{"name":"Proactive Global, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kushal","family":"Patel","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."},{"name":"Proactive Global, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marlon","family":"Gwira","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."},{"name":"Proactive Global, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4431-8171","authenticated-orcid":true,"given":"Andrea","family":"Huber","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nicole","family":"Hurley","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3763-6873","authenticated-orcid":true,"given":"Francesco","family":"Nori","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2390-1771","authenticated-orcid":true,"given":"Raia","family":"Hadsell","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7876-9256","authenticated-orcid":true,"given":"Nicolas","family":"Heess","sequence":"additional","affiliation":[{"name":"Google DeepMind, London, UK."}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"221","reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"K. Sims \u201cEvolving virtual creatures\u201d in Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques (ACM 1994) pp. 15\u201322.","DOI":"10.1145\/192161.192167"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"M. H. Raibert Legged Robots That Balance (MIT Press 1986).","DOI":"10.1109\/MEX.1986.4307016"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-015-9479-3"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2008.02.003"},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","unstructured":"M. P. Deisenroth G. Neumann J. Peters \u201cA survey on policy search for robotics\u201d in Foundations and Trends in Robotics vol. 2 no. 1\u20132 (Now Publishers 2013) pp. 1\u2013142.","DOI":"10.1561\/2300000021"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"e_1_3_2_8_2","unstructured":"N. Heess D. Tirumala S. Sriram J. Lemmon J. Merel G. Wayne Y. Tassa T. Erez Z. Wang A. Eslami M. Riedmiller D. Silver Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)."},{"key":"e_1_3_2_9_2","unstructured":"T. Bansal J. Pachocki S. Sidor I. Sutskever I. Mordatch \u201cEmergent complexity via multi-agent competition\u201d in 6th International Conference on Learning Representations (ICLR 2018)."},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3197517.3201311","article-title":"DeepMimic: Example-guided deep reinforcement learning of physics-based character skills","volume":"37","author":"Peng X. B.","year":"2018","unstructured":"X. B. Peng, P. Abbeel, S. Levine, M. van de Panne, DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transac. Graph. 37, 1\u201314 (2018).","journal-title":"ACM Transac. Graph."},{"key":"e_1_3_2_11_2","first-page":"1","article-title":"Catch & Carry: Reusable neural controllers for vision-guided whole-body tasks","volume":"39","author":"Merel J.","year":"2020","unstructured":"J. Merel, S. Tunyasuvunakool, A. Ahuja, Y. Tassa, L. Hasenclever, V. Pham, T. Erez, G. Wayne, N. Heess, Catch & Carry: Reusable neural controllers for vision-guided whole-body tasks. ACM Transac. Graph. 39, 1\u201314 (2020).","journal-title":"ACM Transac. Graph."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abo0235"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abc5986"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.ade2256"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.aau5872"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","unstructured":"X. B. Peng E. Coumans T. Zhang T.-W. Lee J. Tan S. Levine Learning agile robotic locomotion skills by imitating animals. arXiv:2004.00784 (2020).","DOI":"10.15607\/RSS.2020.XVI.064"},{"key":"e_1_3_2_17_2","unstructured":"J. Lee J. Hwangbo M. Hutter Robust recovery controller for a quadrupedal robot using deep reinforcement learning. arXiv:1901.07517 (2019)."},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"N. Rudin D. Hoeller M. Bjelonic M. Hutter \u201cAdvanced skills by learning locomotion and local navigation end-to-end\u201d in 2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE 2022) pp. 2497\u20132503.","DOI":"10.1109\/IROS47612.2022.9981198"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Y. Ji G. B. Margolis P. Agrawal DribbleBot: Dynamic legged manipulation in the wild. arXiv:2304.01159 (2023).","DOI":"10.1109\/ICRA48891.2023.10160325"},{"key":"e_1_3_2_20_2","unstructured":"S. Bohez S. Tunyasuvunakool P. Brakel F. Sadeghi L. Hasenclever Y. Tassa E. Parisotto J. Humplik T. Haarnoja R. Hafner M. Wulfmeier M. Neunert B. Moran N. Siegel A. Huber F. Romano N. Batchelor F. Casarini J. Merel R. Hadsell N. Heess Imitate and repurpose: Learning reusable robot movement skills from human and animal behaviors. arXiv:2203.17138 (2022)."},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"Y. Ji Z. Li Y. Sun X. B. Peng S. Levine G. Berseth K. Sreenath \u201cHierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot\u201d in 2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE 2022) pp. 1479\u20131486.","DOI":"10.1109\/IROS47612.2022.9981984"},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","unstructured":"X. Huang Z. Li Y. Xiang Y. Ni Y. Chi Y. Li L. Yang X. B. Peng K. Sreenath Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning. arXiv:2210.04435 [cs.RO] (10 October 2022).","DOI":"10.1109\/IROS55552.2023.10341936"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"B. Forrai T. Miki D. Gehrig M. Hutter D. Scaramuzza Event-based agile object catching with a quadrupedal robot. arXiv:2303.17479 (2023).","DOI":"10.1109\/ICRA48891.2023.10161392"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","unstructured":"X. Cheng A. Kumar D. Pathak Legs as manipulator: Pushing quadrupedal agility beyond locomotion. arXiv:2303.11330 (2023).","DOI":"10.1109\/ICRA48891.2023.10161470"},{"key":"e_1_3_2_25_2","unstructured":"Z. Xie P. Clary J. Dao P. Morais J. W. Hurst M. van de Panne Iterative reinforcement learning based design of dynamic locomotion skills for Cassie. arXiv:1903.09537 [cs.RO] (22 March 2019)."},{"key":"e_1_3_2_26_2","unstructured":"Agility Robotics \u201cCassie sets world record for 100m run \u201d 2022; www.youtube.com\/watch?v=DdojWYOK0Nc."},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"J. Siekmann K. Green J. Warila A. Fern J. Hurst Blind bipedal stair traversal via sim-to-real reinforcement learning. arXiv:2105.08328 (2021).","DOI":"10.15607\/RSS.2021.XVII.061"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","unstructured":"Z. Li X. B. Peng P. Abbeel S. Levine G. Berseth K. Sreenath Robust and versatile bipedal jumping control through multi-task reinforcement learning. arXiv:2302.09450 [cs.RO] (1 June 2023).","DOI":"10.15607\/RSS.2023.XIX.052"},{"key":"e_1_3_2_29_2","unstructured":"R. Deits T. Koolen \u201cPicking up momentum \u201d Boston Dynamics January 2023; www.bostondynamics.com\/resources\/blog\/picking-momentum."},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"H. Kitano M. Asada Y. Kuniyoshi I. Noda E. Osawa \u201cRoboCup: The robot world cup initiative\u201d in Proceedings of the First International Conference on Autonomous Agents (ACM 1997) pp. 340\u2013347.","DOI":"10.1145\/267658.267738"},{"key":"e_1_3_2_31_2","unstructured":"RoboCup Federation \u201cRobocup project \u201d May 2022; https:\/\/robocup.org."},{"key":"e_1_3_2_32_2","unstructured":"Robotis \u201cRobotis OP3 manual \u201d March 2023; https:\/\/emanual.robotis.com\/docs\/en\/platform\/op3\/introduction."},{"key":"e_1_3_2_33_2","unstructured":"Robotis \u201cRobotis OP3 source code \u201d April 2023; https:\/\/github.com\/ROBOTIS-GIT\/ ROBOTIS-OP3."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"M. Bestmann J. Zhang \u201cBipedal walking on humanoid robots through parameter optimization\u201d in RoboCup 2022: Robot World Cup XXV vol. 13561 of Lecture Notes in Computer Science A. Eguchi N. Lau M. Paetzel-Pr\u00fcsmann T. Wanichanon Eds. (Springer 2022) pp. 164\u2013176.","DOI":"10.1007\/978-3-031-28469-4_14"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.46409"},{"key":"e_1_3_2_36_2","doi-asserted-by":"crossref","unstructured":"L. McInnes J. Healy J. Melville UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 (February 2018).","DOI":"10.21105\/joss.00861"},{"key":"e_1_3_2_37_2","unstructured":"T. R\u00f6fer T. Laue A. Baude J. Blumenkamp G. Felsch J. Fiedler A. Hasselbring T. Ha\u00df J. Oppermann P. Reichenberg N. Schrader D. Wei\u00df \u201cB-Human team report and code release 2019 \u201d 2019; http:\/\/b-human.de\/downloads\/publications\/2019\/CodeRelease2019.pdf."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364920987859"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abk2822"},{"key":"e_1_3_2_40_2","unstructured":"A. Agarwal A. Kumar J. Malik D. Pathak \u201cLegged locomotion in challenging terrains using egocentric vision\u201d in Conference on Robot Learning (MLResearchPress 2023) pp. 403\u2013415."},{"key":"e_1_3_2_41_2","unstructured":"I. Radosavovic T. Xiao B. Zhang T. Darrell J. Malik K. Sreenath Learning humanoid locomotion with transformers. arXiv:2303.03381 [cs.RO] (14 December 2023)."},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"A. Kumar Z. Fu D. Pathak J. Malik RMA: Rapid motor adaptation for legged robots. arXiv:2107.04034 (2021).","DOI":"10.15607\/RSS.2021.XVII.011"},{"key":"e_1_3_2_43_2","doi-asserted-by":"crossref","unstructured":"L. Smith J. C. Kew T. Li L. Luu X. B. Peng S. Ha J. Tan S. Levine Learning and adapting agile locomotion skills by transferring experience. arXiv:2304.09834 (2023).","DOI":"10.15607\/RSS.2023.XIX.051"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2022.799893"},{"key":"e_1_3_2_45_2","unstructured":"P. Wu A. Escontrela D. Hafner P. Abbeel K. Goldberg \u201cDayDreamer: World models for physical robot learning\u201d in Conference on Robot Learning (MLResearchPress 2023) pp. 2226\u20132240."},{"key":"e_1_3_2_46_2","doi-asserted-by":"crossref","unstructured":"T. Haarnoja S. Ha A. Zhou J. Tan G. Tucker S. Levine \u201cLearning to walk via deep reinforcement learning\u201d in Proceedings of Robotics: Science and Systems (RSS) A. Bicchi H. Kress-Gazit S. Hutchinson Eds. (RSS 2019).","DOI":"10.15607\/RSS.2019.XV.011"},{"key":"e_1_3_2_47_2","unstructured":"S. Ha P. Xu Z. Tan S. Levine J. Tan \u201cLearning to walk in the real world with minimal human effort\u201d in Conference on Robot Learning (MLResearchPress 2021) pp. 1110\u20131120."},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","unstructured":"L. Smith I. Kostrikov S. Levine A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv:2208.07860 (2022).","DOI":"10.15607\/RSS.2023.XIX.056"},{"key":"e_1_3_2_49_2","unstructured":"M. Bloesch J. Humplik V. Patraucean R. Hafner T. Haarnoja A. Byravan N. Y. Siegel S. Tunyasuvunakool F. Casarini N. Batchelor F. Romano S. Saliceti M. Riedmiller S. M. A. Eslami N. Heess \u201cTowards real robot learning in the wild: A case study in bipedal locomotion\u201d in Conference on Robot Learning (MLResearchPress 2022) pp. 1502\u20131511."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3151396"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","unstructured":"G. B. Margolis G. Yang K. Paigwar T. Chen P. Agrawal Rapid locomotion via reinforcement learning. arXiv:2205.02824 (2022).","DOI":"10.15607\/RSS.2022.XVIII.022"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-022-00576-3"},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","unstructured":"I. Mordatch K. Lowrey E. Todorov \u201cEnsemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids\u201d in 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE 2015) pp. 5307\u20135314.","DOI":"10.1109\/IROS.2015.7354126"},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","unstructured":"W. Yu V. C. Kumar G. Turk C. K. Liu \u201cSim-to-real transfer for biped locomotion\u201d in 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE 2019) pp. 3503\u20133510.","DOI":"10.1109\/IROS40897.2019.8968053"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"S. Masuda and K. Takahashi Sim-to-real learning of robust compliant bipedal locomotion on torque sensor-less gear-driven humanoid. arXiv:2204.03897 (2022).","DOI":"10.1109\/Humanoids57100.2023.10375181"},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","unstructured":"Y. Ma F. Farshidian M. Hutter Learning arm-assisted fall damage reduction and recovery for legged mobile manipulators. arXiv:2303.05486 (2023).","DOI":"10.1109\/ICRA48891.2023.10160582"},{"key":"e_1_3_2_57_2","unstructured":"O. Nachum M. Ahn H. Ponte S. Gu V. Kumar Multi-agent manipulation via locomotion using hierarchical sim2real. arXiv:1908.05224 (2019)."},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"M. Riedmiller A. Merke D. Meier A. Hoffmann A. Sinner O. Thate R. Ehrmann \u201cKarlsruhe Brainstormers a reinforcement learning approach to robotic soccer\u201d in RoboCup-2000: Robot Soccer World Cup IV vol. 2019 of Lecture Notes in Computer Science P. Stone T. Balch G. Kraetzschmar Eds. (Springer 2000) pp. 367\u2013372.","DOI":"10.1007\/3-540-45324-5_40"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","unstructured":"K. Tuyls S. Maes B. Manderick \u201cReinforcement learning in large state spaces\u201d in RoboCup 2002: Robot Soccer World Cup VI vol. 2752 of Lecture Notes in Computer Science G. A. Kaminka P. U. Lima R. Rojas Eds. (Springer 2002) pp. 319\u2013326.","DOI":"10.1007\/978-3-540-45135-8_27"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-009-9120-4"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1177\/105971230501300301"},{"key":"e_1_3_2_62_2","doi-asserted-by":"crossref","unstructured":"S. Kalyanakrishnan P. Stone \u201cLearning complementary multiagent behaviors: A case study\u201d in RoboCup 2009: Robot Soccer World Cup XIII vol. 5949 of Lecture Notes in Computer Science J. Baltes M. G. Lagoudakis T. Naruse S. S. Ghidary Eds. (Springer 2010) pp. 153\u2013165.","DOI":"10.1007\/978-3-642-11876-0_14"},{"key":"e_1_3_2_63_2","doi-asserted-by":"crossref","unstructured":"S. Kalyanakrishnan Y. Liu P. Stone \u201cHalf field offense in RoboCup soccer: A multiagent reinforcement learning case study\u201d in RoboCup-2006: Robot Soccer World Cup X vol. 4434 of Lecture Notes in Artificial Intelligence G. Lakemeyer E. Sklar D. Sorenti T. Takahashi Eds. (Springer 2007) pp. 72\u201385.","DOI":"10.1007\/978-3-540-74024-7_7"},{"key":"e_1_3_2_64_2","doi-asserted-by":"crossref","unstructured":"P. Stone M. Veloso \u201cLayered learning\u201d in European Conference on Machine Learning (Springer 2000) pp. 369\u2013381.","DOI":"10.1007\/3-540-45164-1_38"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2017.09.001"},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","unstructured":"M. Abreu L. P. Reis N. Lau \u201cLearning to run faster in a humanoid robot soccer environment through reinforcement learning\u201d in Robot World Cup (Springer 2019) pp. 3\u201315.","DOI":"10.1007\/978-3-030-35699-6_1"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-021-01355-9"},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","unstructured":"M. Saggar T. D\u2019Silva N. Kohl P. Stone \u201cAutonomous learning of stable quadruped locomotion\u201d in RoboCup-2006: Robot Soccer World Cup X vol. 4434 of Lecture Notes in Artificial Intelligence G. Lakemeyer E. Sklar D. Sorenti T. Takahashi Eds. (Springer 2007) pp. 98\u2013109.","DOI":"10.1007\/978-3-540-74024-7_9"},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"M. Hausknecht P. Stone \u201cLearning powerful kicks on the Aibo ERS-7: The quest for a striker\u201d in RoboCup-2010: Robot Soccer World Cup XIV vol. 6556 of Lecture Notes in Artificial Intelligence J. R. del Solar E. Chown P. G. Pl\u00f6ger Eds. (Springer 2011) pp. 254\u201365.","DOI":"10.1007\/978-3-642-20217-9_22"},{"key":"e_1_3_2_70_2","unstructured":"A. Farchy S. Barrett P. MacAlpine P. Stone \u201cHumanoid robots learning to walk faster: From the real world to simulation and back\u201d in Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2013) pp. 39\u201346."},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3987"},{"key":"e_1_3_2_72_2","unstructured":"A. Abdolmaleki S. Huang L. Hasenclever M. Neunert F. Song M. Zambelli M. Martins N. Heess R. Hadsell M. Riedmiller \u201cA distributional view on multi-objective policy optimization\u201d in International Conference on Machine Learning (MLResearchPress 2020) pp. 11\u201322."},{"key":"e_1_3_2_73_2","unstructured":"A. Ray J. Achiam D. Amodei Benchmarking safe exploration in deep reinforcement learning. arXiv:2310.03225 (2019)."},{"key":"e_1_3_2_74_2","unstructured":"Y. Tassa Y. Doron A. Muldal T. Erez Y. Li D. de Las Casas D. Budden A. Abdolmaleki J. Merel A. Lefrancq T. P. Lillicrap M. A. Riedmiller Deepmind control suite. arXiv:1801.00690 [cs.AI] (2 January 2018)."},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_3_2_76_2","doi-asserted-by":"crossref","unstructured":"A. Byravan J. Humplik L. Hasenclever A. Brussee F. Nori T. Haarnoja B. Moran S. Bohez F. Sadeghi B. Vujatovic N. Heess \u201cNeRF2Real: Sim2real transfer of vision-guided bipedal motion skills using neural radiance fields\u201d in Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (IEEE 2023) pp. 9362\u20139369.","DOI":"10.1109\/ICRA48891.2023.10161544"},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","unstructured":"E. Todorov T. Erez Y. Tassa \u201cMujoco: A physics engine for model-based control\u201d in 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IEEE 2012) pp. 5026\u20135033.","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpa.2020.100022"},{"key":"e_1_3_2_79_2","unstructured":"Optitrack \u201cMotive optical motion capture software \u201d March 2023; https:\/\/optitrack.com\/."},{"key":"e_1_3_2_80_2","unstructured":"A. Abdolmaleki J. T. Springenberg Y. Tassa R. Munos N. Heess M. Riedmiller \u201cMaximum a posteriori policy optimisation\u201d in Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)."},{"key":"e_1_3_2_81_2","unstructured":"M. G. Bellemare W. Dabney R. Munos \u201cA distributional perspective on reinforcement learning\u201d in Proceedings of the 34th International Conference on Machine Learning (ACM 2017) pp. 449\u2013458."},{"key":"e_1_3_2_82_2","unstructured":"J. Heinrich M. Lanctot D. Silver \u201cFictitious self-play in extensive-form games\u201d in Proceedings of the 32nd International Conference on Machine Learning vol. 37 of JMLR Workshop and Conference Proceedings F. R. Bach D. M. Blei Eds. (ACM 2015) pp. 805\u2013813."},{"key":"e_1_3_2_83_2","first-page":"4190","article-title":"A unified game-theoretic approach to multiagent reinforcement learning","volume":"30","author":"Lanctot M.","year":"2017","unstructured":"M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning. Adv. Neural Inf. Process. Syst. 30, 4190\u20134203 (2017).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_84_2","unstructured":"A. A. Rusu S. G. Colmenarejo C. Gulcehre G. Desjardins J. Kirkpatrick R. Pascanu V. Mnih K. Kavukcuoglu R. Hadsell Policy distillation. arXiv:1511.06295 (2015)."},{"key":"e_1_3_2_85_2","unstructured":"E. Parisotto J. L. Ba R. Salakhutdinov Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv:1511.06342 (2015)."},{"key":"e_1_3_2_86_2","unstructured":"Y. Teh V. Bapst W. M. Czarnecki J. Quan J. Kirkpatrick R. Hadsell N. Heess R. Pascanu Distral: Robust multitask reinforcement learning. Adv. Neural Inf. Process. Syst. 30 (2017)."},{"key":"e_1_3_2_87_2","unstructured":"A. Galashov S. Jayakumar L. Hasenclever D. Tirumala J. Schwarz G. Desjardins W. M. Czarnecki Y. W. Teh R. Pascanu N. Heess \u201cInformation asymmetry in KLregularized RL\u201d in International Conference on Learning Representations New Orleans LA 6 to 9 May 2019."},{"key":"e_1_3_2_88_2","unstructured":"S. Schmitt J. J. Hudson A. Z\u2019\u0131dek S. Osindero C. Doersch W. M. Czarnecki J. Z. Leibo H. K\u00fcttler A. Zisserman K. Simonyan S. M. A. Eslami Kickstarting deep reinforcement learning. arXiv:1803.03835 (2018)."},{"key":"e_1_3_2_89_2","unstructured":"A. Abdolmaleki S. H. Huang G. Vezzani B. Shahriari J. T. Springenberg S. Mishra D. TB A. Byravan K. Bousmalis A. Gyorgy C. Szepesvari R. Hadsell N. Heess M. Riedmiller On multi-objective policy optimization as a tool for reinforcement learning. arXiv:2106.08199 (2021)."},{"key":"e_1_3_2_90_2","unstructured":"A. Stooke J. Achiam P. Abbeel \u201cResponsive safety in reinforcement learning by pid lagrangian methods\u201d in Proceedings of the 37th International Conference on Machine Learning (ICML 2020) pp. 9133\u20139143."},{"key":"e_1_3_2_91_2","unstructured":"S. Liu G. Lever J. Merel S. Tunyasuvunakool N. Heess T. Graepel \u201cEmergent coordination through competition\u201d in International Conference on Learning Representations New Orleans LA 6 to 9 May 2019."},{"key":"e_1_3_2_92_2","unstructured":"S. Thrun A. Schwartz Finding structure in reinforcement learning. Adv. Neural Inf. Process. Syst. 7 (1994)."},{"key":"e_1_3_2_93_2","unstructured":"M. Bowling M. Veloso \u201cReusing learned policies between similar problems\u201d in Proceedings of the AI* AI-98 Workshop on New Trends in Robotics (1998); https:\/\/cs.cmu.edu\/afs\/cs\/user\/mmv\/www\/papers\/rl-reuse.pdf."},{"key":"e_1_3_2_94_2","unstructured":"X. B. Peng M. Chang G. Zhang P. Abbeel S. Levine \u201cMCP: learning composable hierarchical control with multiplicative compositional policies\u201d in Advances in Neural Information Processing Systems H. M. Wallach H. Larochelle A. Beygelzimer F. d\u2019Alch\u00e9Buc E. B. Fox R. Garnett Eds. (MIT Press 2019) pp. 3681\u20133692."},{"key":"e_1_3_2_95_2","unstructured":"M. Wulfmeier D. Rao R. Hafner T. Lampe A. Abdolmaleki T. Hertweck M. Neunert D. Tirumala N. Siegel N. Heess M. Riemiller \u201cData-efficient hindsight off-policy option learning\u201d in International Conference on Machine Learning (MLResearchPress 2021) pp. 11340\u201311350."},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459761"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_3_2_98_2","unstructured":"S. Salter M. Wulfmeier D. Tirumala N. Heess M. Riedmiller R. Hadsell D. Rao \u201cMo2: Model-based offline options\u201d in Conference on Lifelong Learning Agents (MLResearchPress 2022) pp. 902\u2013919."},{"key":"e_1_3_2_99_2","unstructured":"S. Ross G. Gordon D. Bagnell \u201cA reduction of imitation learning and structured prediction to no-regret online learning\u201d in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011) pp. 627\u2013635."},{"key":"e_1_3_2_100_2","first-page":"9989","article-title":"Behavior priors for efficient reinforcement learning","volume":"23","author":"Tirumala D.","year":"2022","unstructured":"D. Tirumala, A. Galashov, H. Noh, L. Hasenclever, R. Pascanu, J. Schwarz, G. Desjardins, W. M. Czarnecki, A. Ahuja, Y. W. Teh et al., Behavior priors for efficient reinforcement learning. J. Mach. Learn. Res. 23, 9989\u201310056 (2022).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_101_2","unstructured":"M. Riedmiller R. Hafner T. Lampe M. Neunert J. Degrave T. van de Wiele V. Mnih N. Heess J. T. Springenberg \u201cLearning by playing solving sparse reward tasks from scratch\u201d in Proceedings of the 35th International Conference on Machine Learning (ACM 2018) pp. 4344\u20134353."},{"key":"e_1_3_2_102_2","unstructured":"G. Vezzani D. Tirumala M. Wulfmeier D. Rao A. Abdolmaleki B. Moran T. Haarnoja J. Humplik R. Hafner M. Neunert C. Fantacci T. Hertweck T. Lampe F. Sadeghi N. Heess M. Riedmiller Skills: Adaptive skill sequencing for efficient temporally-extended exploration. arXiv:2211.13743 (2022)."},{"key":"e_1_3_2_103_2","unstructured":"A. A. Team J. Bauer K. Baumli S. Baveja F. M. P. Behbahani A. Bhoopchand N. Bradley-Schmieg M. Chang N. Clay A. Collister V. Dasagi L. Gonzalez K. Gregor E. Hughes S. Kashem M. Loks-Thompson H. Openshaw J. Parker-Holder S. Pathak N. P. Nieves N. Rakicevic T. Rockt\u00e4schel Y. Schroecker J. Sygnowski K. Tuyls S. York A. Zacherl L. M. Zhang Human-timescale adaptation in an open-ended task space. arXiv:2301.07608 (2023)."},{"key":"e_1_3_2_104_2","unstructured":"R. Hafner T. Hertweck P. Kl\u00f6ppner M. Bloesch M. Neunert M. Wulfmeier S. Tunyasuvunakool N. Heess M. Riedmiller \u201cTowards general and autonomous learning of core skills: A case study in locomotion\u201d in Conference on Robot Learning (MLResearchPress 2021) pp. 1084\u20131099."},{"key":"e_1_3_2_105_2","doi-asserted-by":"crossref","unstructured":"M. Wulfmeier A. Abdolmaleki R. Hafner J. T. Springenberg M. Neunert T. Hertweck T. Lampe N. Siegel N. Heess M. Riedmiller Compositional transfer in hierarchical reinforcement learning. arXiv:1906.11228 (2019).","DOI":"10.15607\/RSS.2020.XVI.054"},{"key":"e_1_3_2_106_2","unstructured":"D. Balduzzi M. Garnelo Y. Bachrach W. Czarnecki J. P\u00e9rolat M. Jaderberg T. Graepel \u201cOpen-ended learning in symmetric zero-sum games\u201d in Proceedings of the 36th International Conference on Machine Learning (ICML) vol. 97 of Proceedings of Machine Learning Research K. Chaudhuri R. Salakhutdinov Eds. (MLResearchPress 2019) pp. 434\u2013443."},{"key":"e_1_3_2_107_2","unstructured":"G. W. Brown \u201cIterative solution of games by fictitious play\u201d in Activity Analysis of Production and Allocation T. C. Koopmans Ed. (Wiley 1951)."},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_3_2_109_2","unstructured":"B. Baker I. Kanitscheider T. Markov Y. Wu G. Powell B. McGrew I. Mordatch \u201cEmergent tool use from multi-agent autocurricula\u201d in 8th International Conference on Learning Representations (ICLR 2020)."},{"key":"e_1_3_2_110_2","unstructured":"R. S. Sutton A. G. Barto Reinforcement Learning: An Introduction (MIT Press 2018)."},{"key":"e_1_3_2_111_2","unstructured":"J. Schulman S. Levine P. Abbeel M. Jordan P. Moritz \u201cTrust region policy optimization\u201d in Proceedings of the 32nd International Conference on Machine Learning (ICML) (ACM 2015) pp. 1889\u20131897."},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_113_2","unstructured":"T. Haarnoja B. Moran G. Lever S. H. Huang D. Tirumala J. Humplik M. Wulfmeier S. Tunyasuvunakool N. Y. Siegel R. Hafner M. Bloesch K. Hartikainen A. Byravan L. Hasenclever T. Y. F. Sadeghi N. Batchelor F. Casarini S. Saliceti C. Game N. Sreendra K. Patel M. Gwira A. Huber N. Hurley F. Nori R. Hadsell N. Heess Data release for: Learning agile soccer skills for a bipedal robot with deep reinforcement learning [data set] 2024; https:\/\/doi.org\/10.5281\/zenodo.10793725."}],"container-title":["Science Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.science.org\/doi\/pdf\/10.1126\/scirobotics.adi8022","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T17:58:31Z","timestamp":1712771911000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.science.org\/doi\/10.1126\/scirobotics.adi8022"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,10]]},"references-count":112,"journal-issue":{"issue":"89","published-print":{"date-parts":[[2024,4,10]]}},"alternative-id":["10.1126\/scirobotics.adi8022"],"URL":"https:\/\/doi.org\/10.1126\/scirobotics.adi8022","relation":{},"ISSN":["2470-9476"],"issn-type":[{"value":"2470-9476","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,10]]},"assertion":[{"value":"2023-05-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-14","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"eadi8022"}}