{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T09:29:24Z","timestamp":1778923764729,"version":"3.51.4"},"reference-count":224,"publisher":"SAGE Publications","issue":"11","license":[{"start":{"date-parts":[[2013,8,23]],"date-time":"2013-08-23T00:00:00Z","timestamp":1377216000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2013,9]]},"abstract":"<jats:p>Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.<\/jats:p>","DOI":"10.1177\/0278364913495721","type":"journal-article","created":{"date-parts":[[2013,8,23]],"date-time":"2013-08-23T20:15:57Z","timestamp":1377288957000},"page":"1238-1274","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2309,"title":["Reinforcement learning in robotics: A survey"],"prefix":"10.1177","volume":"32","author":[{"given":"Jens","family":"Kober","sequence":"first","affiliation":[{"name":"Bielefeld University, CoR-Lab Research Institute for Cognition and Robotics, Bielefeld, Germany"},{"name":"Honda Research Institute Europe, Offenbach\/Main, Germany"}]},{"given":"J. Andrew","family":"Bagnell","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Robotics Institute, Pittsburgh, PA, USA"}]},{"given":"Jan","family":"Peters","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Intelligent Systems, Department of Empirical Inference, T\u00fcbingen, Germany"},{"name":"Technische Universit\u00e4t Darmstadt, FB Informatik, FG Intelligent Autonomous Systems, Darmstadt, Germany"}]}],"member":"179","published-online":{"date-parts":[[2013,8,23]]},"reference":[{"key":"bibr1-0278364913495721","doi-asserted-by":"crossref","unstructured":"Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems (NIPS).","DOI":"10.7551\/mitpress\/7503.003.0006"},{"key":"bibr2-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"bibr3-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143845"},{"key":"bibr4-0278364913495721","volume-title":"Model-based control of a robot manipulator","author":"An CH","year":"1988"},{"key":"bibr5-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2008.4651020"},{"key":"bibr6-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2008.10.024"},{"key":"bibr7-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1002\/rob.4620010203"},{"key":"bibr8-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/BF00117447"},{"key":"bibr9-0278364913495721","volume-title":"Adaptive Control","author":"\u00c5str\u00f6m KJ","year":"1989"},{"key":"bibr10-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Atkeson CG","year":"1994"},{"key":"bibr11-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Atkeson CG","year":"1998"},{"key":"bibr12-0278364913495721","first-page":"75","volume":"11","author":"Atkeson CG","year":"1997","journal-title":"AI Review"},{"key":"bibr13-0278364913495721","volume-title":"International Conference on Machine Learning (ICML)","author":"Atkeson CG","year":"1997"},{"key":"bibr14-0278364913495721","volume-title":"Learning Decisions: Robustness, Uncertainty, and Approximation","author":"Bagnell JA","year":"2004"},{"key":"bibr15-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Bagnell JA","year":"2003"},{"key":"bibr16-0278364913495721","volume-title":"International Joint Conference on Artifical Intelligence (IJCAI)","author":"Bagnell JA","year":"2003"},{"key":"bibr17-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2001.932842"},{"key":"bibr18-0278364913495721","doi-asserted-by":"publisher","DOI":"10.21236\/ADA280844"},{"key":"bibr19-0278364913495721","volume-title":"IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Bakker B","year":"2003"},{"key":"bibr20-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2006.1642157"},{"key":"bibr21-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1023\/A:1025696116075"},{"key":"bibr22-0278364913495721","volume-title":"Dynamic Programming","author":"Bellman RE","year":"1957"},{"key":"bibr23-0278364913495721","volume-title":"Introduction to the Mathematical Theory of Control Processes","author":"Bellman RE","year":"1967"},{"key":"bibr24-0278364913495721","volume-title":"Introduction to the Mathematical Theory of Control Processes","author":"Bellman RE","year":"1971"},{"key":"bibr25-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.1992.287219"},{"key":"bibr26-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(97)00043-2"},{"key":"bibr27-0278364913495721","volume-title":"Learning from Observation Using Primitives","author":"Bentivegna DC","year":"2004"},{"key":"bibr28-0278364913495721","series-title":"Springer Tracts in Advanced Robotics","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1007\/11008941_59","volume-title":"Robotics Research","volume":"15","author":"Bentivegna DC","year":"2004"},{"key":"bibr29-0278364913495721","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas DP","year":"1995"},{"key":"bibr30-0278364913495721","series-title":"Advances in Design and Control","volume-title":"Practical methods for optimal control using nonlinear programming","volume":"3","author":"Betts JT","year":"2001"},{"key":"bibr31-0278364913495721","volume-title":"Reinforcement learning in sensor-guided AIBO robots","author":"Birdwell N","year":"2007"},{"key":"bibr32-0278364913495721","series-title":"Information Science and Statistics","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop C","year":"2006"},{"key":"bibr33-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2010.5650243"},{"key":"bibr34-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Boyan JA","year":"1995"},{"key":"bibr35-0278364913495721","first-page":"213","volume":"3","author":"Brafman RI","year":"2002","journal-title":"Journal of Machine Learning Research"},{"key":"bibr36-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364911402527"},{"key":"bibr37-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/TCST.2005.847335"},{"key":"bibr38-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1201\/9781439821091"},{"key":"bibr39-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1538788.1538812"},{"key":"bibr40-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2006.282061"},{"key":"bibr41-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/CIRA.2007.382878"},{"key":"bibr42-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386047"},{"key":"bibr43-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5106-x"},{"key":"bibr44-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.2.271"},{"key":"bibr45-0278364913495721","volume-title":"28th International Conference on Machine Learning (ICML)","author":"Deisenroth MP","year":"2011"},{"key":"bibr46-0278364913495721","doi-asserted-by":"crossref","unstructured":"Deisenroth MP, Rasmussen CE, Fox D (2011) Learning to control a low-cost manipulator using data-efficient reinforcement learning. In:s Robotics: Science and Systems (RSS).","DOI":"10.15607\/RSS.2011.VII.008"},{"key":"bibr47-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/3477.499790"},{"key":"bibr48-0278364913495721","volume-title":"American Mathematical Society Conference Math Challenges of the 21st Century","author":"Donoho DL","year":"2000"},{"key":"bibr49-0278364913495721","volume-title":"International Computer Science Institute","author":"Dorigo M","year":"1993"},{"key":"bibr50-0278364913495721","unstructured":"Duan Y, Cui B, Yang H (2008) Robot navigation based on fuzzy RL algorithm. In: International Symposium on Neural Networks (ISNN)."},{"key":"bibr51-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2007.01.003"},{"key":"bibr52-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364907084980"},{"key":"bibr53-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2007.08.001"},{"key":"bibr54-0278364913495721","unstructured":"Fagg AH, Lotspeich DL, Hoff J, Bekey GA (1998) Rapid reinforcement learning for reactive control policy design for autonomous robots. In: Artificial Life in Robotics."},{"key":"bibr55-0278364913495721","unstructured":"Fidelman P, Stone P (2004) Learning ball acquisition on a physical robot. In: International Symposium on Robotics and Automation (ISRA)."},{"key":"bibr56-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0142331209104155"},{"key":"bibr57-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2000.894638"},{"key":"bibr58-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Geng T","year":"2006"},{"key":"bibr59-0278364913495721","doi-asserted-by":"publisher","DOI":"10.21236\/ADA197085"},{"key":"bibr60-0278364913495721","volume-title":"Genetic algorithms","author":"Goldberg DE","year":"1989"},{"key":"bibr61-0278364913495721","volume-title":"School of Computer Science, Carnegie Mellon University","author":"Gordon GJ","year":"1999"},{"key":"bibr62-0278364913495721","volume-title":"Joint International Symposium on Robotics (ISR) and German Conference on Robotics (ROBOTIK)","author":"Gr\u00e4ve K","year":"2010"},{"key":"bibr63-0278364913495721","first-page":"1471","volume":"5","author":"Greensmith E","year":"2004","journal-title":"Journal of Machine Learning Research"},{"key":"bibr64-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1163\/156855307782148550"},{"key":"bibr65-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/37.257890"},{"key":"bibr66-0278364913495721","volume-title":"IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Hafner R","year":"2003"},{"key":"bibr67-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2007.363631"},{"key":"bibr68-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ICSMC.1998.728096"},{"key":"bibr69-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/TAMD.2010.2103311"},{"key":"bibr70-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509181"},{"key":"bibr71-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2012.6225072"},{"key":"bibr72-0278364913495721","unstructured":"Huang X, Weng J (2002) Novelty and reinforcement learning in the value system of developmental robots. In: 2nd International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems."},{"key":"bibr73-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(97)00044-4"},{"key":"bibr74-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Ijspeert AJ","year":"2003"},{"key":"bibr75-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1999.770457"},{"key":"bibr76-0278364913495721","doi-asserted-by":"crossref","unstructured":"Jaakkola T, Jordan MI, Singh SP (1993) Convergence of stochastic iterative dynamic programming algorithms. In: Advances in Neural Information Processing Systems (NIPS).","DOI":"10.21236\/ADA276517"},{"key":"bibr77-0278364913495721","volume-title":"Differential Dynamic Programming","author":"Jacobson DH","year":"1970"},{"key":"bibr78-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-59496-5_337"},{"key":"bibr79-0278364913495721","volume-title":"Stanford University, Stanford","author":"Kaelbling LP","year":"1990"},{"key":"bibr80-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1613\/jair.301"},{"key":"bibr81-0278364913495721","volume-title":"On the Sample Complexity of Reinforcement Learning","author":"Kakade S","year":"2003"},{"key":"bibr82-0278364913495721","volume-title":"International Conference on Machine Learning (ICML)","author":"Kakade S","year":"2002"},{"key":"bibr83-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6095096"},{"key":"bibr84-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1115\/1.3653115"},{"key":"bibr85-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-49240-2_3"},{"key":"bibr86-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/2005\/11\/P11011"},{"key":"bibr87-0278364913495721","doi-asserted-by":"crossref","unstructured":"Katz D, Pyuro Y, Brock O (2008) Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems (RSS).","DOI":"10.15607\/RSS.2008.IV.033"},{"issue":"3","key":"bibr88-0278364913495721","first-page":"365","volume":"6","author":"Kawato M","year":"1990","journal-title":"Advanced Neural Computers"},{"key":"bibr89-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1023\/A:1017984413808"},{"key":"bibr90-0278364913495721","volume-title":"Decisions with multiple objectives: Preferences and value tradeoffs","author":"Keeney R","year":"1976"},{"key":"bibr91-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2001.980135"},{"key":"bibr92-0278364913495721","unstructured":"Kirchner F (1997) Q-learning of complex behaviours on a six-legged walking machine. In: EUROMICRO Workshop on Advanced Mobile Robots."},{"key":"bibr93-0278364913495721","volume-title":"Optimal control theory","author":"Kirk DE","year":"1970"},{"key":"bibr94-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2007.363075"},{"key":"bibr95-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2008.4650953"},{"key":"bibr96-0278364913495721","doi-asserted-by":"crossref","unstructured":"Kober J, Oztop E, Peters J (2010) Reinforcement learning to adjust robot movements to new situations. In: Robotics: Science and Systems (RSS).","DOI":"10.15607\/RSS.2010.VI.005"},{"key":"bibr97-0278364913495721","doi-asserted-by":"crossref","unstructured":"Kober J, Peters J (2009) Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems (NIPS).","DOI":"10.1109\/ROBOT.2009.5152577"},{"issue":"1","key":"bibr98-0278364913495721","first-page":"171","volume":"84","author":"Kober J","year":"2010","journal-title":"Machine Learning"},{"key":"bibr99-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2004.1307456"},{"key":"bibr100-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364907087426"},{"key":"bibr101-0278364913495721","unstructured":"Kolter JZ, Abbeel P, Ng AY (2007) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in Neural Information Processing Systems (NIPS)."},{"key":"bibr102-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390218"},{"key":"bibr103-0278364913495721","doi-asserted-by":"crossref","unstructured":"Kolter JZ, Ng AY (2009a) Policy search via the signed derivative. In: Robotics: Science and Systems (RSS).","DOI":"10.7551\/mitpress\/8727.003.0028"},{"key":"bibr104-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553442"},{"key":"bibr105-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509562"},{"key":"bibr106-0278364913495721","volume-title":"AAAI Conference on Artificial Intelligence (AAAI)","author":"Konidaris GD","year":"2011"},{"key":"bibr107-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364911428653"},{"key":"bibr108-0278364913495721","volume-title":"AAAI Conference on Artificial Intelligence (AAAI)","author":"Konidaris GD","year":"2011"},{"key":"bibr109-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2009.5354345"},{"key":"bibr110-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2010.06.001"},{"key":"bibr111-0278364913495721","doi-asserted-by":"crossref","unstructured":"Kuhn HW, Tucker AW (1950) Nonlinear programming. In: Berkeley Symposium on Mathematical Statistics and Probability.","DOI":"10.1525\/9780520411586-036"},{"key":"bibr112-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/Humanoids.2011.6100881"},{"key":"bibr113-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2004.1389903"},{"key":"bibr114-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102411"},{"key":"bibr115-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74024-7_5"},{"key":"bibr116-0278364913495721","volume-title":"University of Illinois at Urbana-Champaign","author":"Laud AD","year":"2004"},{"key":"bibr117-0278364913495721","first-page":"89","volume-title":"The Handbook of Markov Decision Processes: Methods and Applications","author":"Lewis ME","year":"2001"},{"key":"bibr118-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6385878"},{"key":"bibr119-0278364913495721","volume-title":"International Joint Conference on Artifical Intelligence (IJCAI)","author":"Lizotte D","year":"2007"},{"key":"bibr120-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(92)90058-6"},{"key":"bibr121-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2005.1570760"},{"key":"bibr122-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-335-6.50030-1"},{"key":"bibr123-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008819414322"},{"key":"bibr124-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102426"},{"key":"bibr125-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2005.1545206"},{"key":"bibr126-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(96)00043-3"},{"key":"bibr127-0278364913495721","first-page":"1711","volume-title":"29th International Conference on Machine Learning (ICML)","author":"Moldovan TM","year":"2012"},{"key":"bibr128-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993104"},{"key":"bibr129-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(01)00113-0"},{"key":"bibr130-0278364913495721","author":"Muelling K","year":"2012","journal-title":"The International Journal of Robotics Research"},{"key":"bibr131-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364908091463"},{"key":"bibr132-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ICHR.2009.5379568"},{"key":"bibr133-0278364913495721","doi-asserted-by":"crossref","unstructured":"Nemec B, Zorko M, Zlajpah L (2010) Learning of a ball in a cup playing robot. In: International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD).","DOI":"10.1109\/RAAD.2010.5524570"},{"key":"bibr134-0278364913495721","unstructured":"Ng AY, Coates A, Diel M, (2004a) Autonomous inverted helicopter flight via reinforcement learning. In: International Symposium on Experimental Robotics (ISER)."},{"key":"bibr135-0278364913495721","volume-title":"International Conference on Machine Learning (ICML)","author":"Ng AY","year":"1999"},{"key":"bibr136-0278364913495721","unstructured":"Ng AY, Kim HJ, Jordan MI, Sastry S (2004b) Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS)."},{"key":"bibr137-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/TRA.2002.999653"},{"key":"bibr138-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509420"},{"key":"bibr139-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74565-5_19"},{"key":"bibr140-0278364913495721","volume-title":"IEEE International Conference on Humanoid Robots (HUMANOIDS)","author":"Park DH","year":"2008"},{"key":"bibr141-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980200"},{"key":"bibr142-0278364913495721","doi-asserted-by":"crossref","unstructured":"Pendrith M (1999) Reinforcement learning in situated agents: Some theoretical problems and practical solutions. In: European Workshop on Learning Robots (EWRL).","DOI":"10.1007\/3-540-40044-3_6"},{"key":"bibr143-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114731"},{"key":"bibr144-0278364913495721","first-page":"803","volume":"3","author":"Perkins TJ","year":"2002","journal-title":"Journal of Machine Learning Research"},{"key":"bibr145-0278364913495721","volume-title":"National Conference on Artificial Intelligence (AAAI)","author":"Peters J","year":"2010"},{"key":"bibr146-0278364913495721","doi-asserted-by":"crossref","unstructured":"Peters J, Muelling K, Kober J, Nguyen-Tuong D, Kroemer O (2010b) Towards motor skill learning for robotics. In: International Symposium on Robotics Research (ISRR).","DOI":"10.1007\/978-3-642-19457-3_28"},{"key":"bibr147-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364907087548"},{"key":"bibr148-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2007.11.026"},{"key":"bibr149-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2008.02.003"},{"key":"bibr150-0278364913495721","volume-title":"University of Southern California","author":"Peters J","year":"2004"},{"key":"bibr151-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910382464"},{"key":"bibr152-0278364913495721","volume-title":"International Conference on Development and Learning","author":"Platt R","year":"2006"},{"key":"bibr153-0278364913495721","volume-title":"Princeton University","author":"Powell WB","year":"2012"},{"key":"bibr154-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"bibr155-0278364913495721","first-page":"463","volume-title":"International Conference on Machine Learning (ICML)","author":"Randl\u00f8v J","year":"1998"},{"key":"bibr156-0278364913495721","series-title":"Adaptive Computation And Machine Learning","volume-title":"Gaussian Processes for Machine Learning","author":"Rasmussen C","year":"2006"},{"key":"bibr157-0278364913495721","doi-asserted-by":"publisher","DOI":"10.21236\/ADA528601"},{"key":"bibr158-0278364913495721","doi-asserted-by":"crossref","unstructured":"Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2006a) Boosting structured prediction for imitation learning. In: Advances in Neural Information Processing Systems (NIPS).","DOI":"10.7551\/mitpress\/7503.003.0149"},{"key":"bibr159-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143936"},{"key":"bibr160-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-009-9120-4"},{"key":"bibr161-0278364913495721","volume-title":"An Introduction to the Approximation of Functions","author":"Rivlin TJ","year":"1969"},{"key":"bibr162-0278364913495721","doi-asserted-by":"crossref","unstructured":"Roberts JW, Manchester I, Tedrake R (2011) Feedback controller parameterizations for reinforcement learning. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).","DOI":"10.1109\/ADPRL.2011.5967370"},{"key":"bibr163-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-05181-4_13"},{"key":"bibr164-0278364913495721","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.2004.1384022"},{"key":"bibr165-0278364913495721","volume-title":"International Conference on Machine Learning (ICML)","author":"Ross S","year":"2012"},{"key":"bibr166-0278364913495721","volume-title":"International Conference on Artifical Intelligence and Statistics (AISTATS)","author":"Ross S","year":"2011"},{"key":"bibr167-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995724"},{"key":"bibr168-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2007.4399531"},{"key":"bibr169-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-4321-0"},{"key":"bibr170-0278364913495721","volume-title":"European Conference on Machine Learning (ECML)","author":"R\u00fcckstie\u00df T","year":"2008"},{"key":"bibr171-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279964"},{"key":"bibr172-0278364913495721","doi-asserted-by":"publisher","DOI":"10.2307\/2171751"},{"key":"bibr173-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-46084-5_126"},{"key":"bibr174-0278364913495721","unstructured":"Schaal S (1996) Learning from demonstration. In: Advances in Neural Information Processing Systems (NIPS)."},{"key":"bibr175-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S1364-6613(99)01327-3"},{"key":"bibr176-0278364913495721","volume-title":"University of Southern California","author":"Schaal S","year":"2009"},{"key":"bibr177-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/37.257895"},{"key":"bibr178-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015727715131"},{"key":"bibr179-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0079-6123(06)65027-9"},{"key":"bibr180-0278364913495721","unstructured":"Schneider JG (1996) Exploiting model uncertainty estimates for safe dynamic control learning. In: Advances in Neural Information Processing Systems (NIPS)."},{"key":"bibr181-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50045-9"},{"key":"bibr182-0278364913495721","doi-asserted-by":"crossref","unstructured":"Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems (RSS).","DOI":"10.15607\/RSS.2008.IV.034"},{"key":"bibr183-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910369715"},{"key":"bibr184-0278364913495721","volume-title":"National Conference on Artificial Intelligence\/Innovative Applications of Artificial Intelligence (AAAI\/IAAI)","author":"Smart WD","year":"1998"},{"key":"bibr185-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2002.1014237"},{"key":"bibr186-0278364913495721","volume-title":"International Conference on Development and Learning (ICDL)","author":"Soni V","year":"2006"},{"key":"bibr187-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Sorg J","year":"2010"},{"key":"bibr188-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1002\/0471722138"},{"key":"bibr189-0278364913495721","volume-title":"International Conference on Machine Learning (ICML)","author":"Strens M","year":"2001"},{"key":"bibr190-0278364913495721","volume-title":"International Conference on Machine Learning (ICML)","author":"Stulp F","year":"2012"},{"key":"bibr191-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6094877"},{"key":"bibr192-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-141-3.50030-4"},{"key":"bibr193-0278364913495721","volume-title":"Reinforcement Learning","author":"Sutton RS","year":"1998"},{"key":"bibr194-0278364913495721","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.1991.4791776"},{"key":"bibr195-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273606"},{"key":"bibr196-0278364913495721","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Sutton RS","year":"1999"},{"key":"bibr197-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0954-1810(01)00027-9"},{"key":"bibr198-0278364913495721","series-title":"Technical Report 94-30-1","volume-title":"H-learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward","author":"Tadepalli P","year":"1994"},{"key":"bibr199-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03040-6_125"},{"key":"bibr200-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2011.07.004"},{"key":"bibr201-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2004.1389841"},{"key":"bibr202-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910369189"},{"key":"bibr203-0278364913495721","volume-title":"Yale Workshop on Adaptive and Learning Systems","author":"Tedrake R","year":"2005"},{"key":"bibr204-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6095076"},{"key":"bibr205-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509336"},{"key":"bibr206-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/0921-8890(95)00022-8"},{"key":"bibr207-0278364913495721","volume-title":"International Florida Artificial Intelligence Research Society Conference (FLAIRS)","author":"Tokic M","year":"2009"},{"key":"bibr208-0278364913495721","volume-title":"Inference and Learning in Dynamic Models","author":"Toussaint M","year":"2010"},{"key":"bibr209-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(97)00042-0"},{"key":"bibr210-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/9.580874"},{"key":"bibr211-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1998.677351"},{"key":"bibr212-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509621"},{"key":"bibr213-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-009-9132-0"},{"key":"bibr214-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/RAMECH.2006.252749"},{"key":"bibr215-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ICHR.2010.5686339"},{"key":"bibr216-0278364913495721","unstructured":"Wikipedia (2013) Fosbury Flop. http:\/\/en.wikipedia.org\/wiki\/Fosbury_FlopFosbury_Flop."},{"key":"bibr217-0278364913495721","volume-title":"Australian Conference on Robotics and Automation","author":"Willgoss RA","year":"1999"},{"key":"bibr218-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"bibr219-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1997.620036"},{"key":"bibr220-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69134-1_25"},{"key":"bibr221-0278364913495721","volume-title":"ICGST International Conference on Automation, Robotics and Autonomous Systems (ARAS)","author":"Youssef SM","year":"2005"},{"key":"bibr222-0278364913495721","volume-title":"Essentials of Robust Control","author":"Zhou K","year":"1997"},{"key":"bibr223-0278364913495721","volume-title":"AAAI Conference on Artificial Intelligence (AAAI)","author":"Ziebart BD","year":"2008"},{"key":"bibr224-0278364913495721","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2012.6225036"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364913495721","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364913495721","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364913495721","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:17:53Z","timestamp":1777457873000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364913495721"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,23]]},"references-count":224,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2013,9]]}},"alternative-id":["10.1177\/0278364913495721"],"URL":"https:\/\/doi.org\/10.1177\/0278364913495721","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,8,23]]}}}