{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T03:28:45Z","timestamp":1775964525666,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2013,7,5]],"date-time":"2013-07-05T00:00:00Z","timestamp":1372982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the application of reinforcement learning to real-world robots are described: a pancake flipping task, a bipedal walking energy minimization task and an archery-based aiming task. In all examples, a state-of-the-art expectation-maximization-based reinforcement learning is used, and different policy representations are proposed and evaluated for each task. The proposed policy representations offer viable solutions to six rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, globality, multi-dimensionality and convergence. Both the successes and the practical difficulties encountered in these examples are discussed. Based on insights from these particular cases, conclusions are drawn about the state-of-the-art and the future perspective directions for reinforcement learning in robotics.<\/jats:p>","DOI":"10.3390\/robotics2030122","type":"journal-article","created":{"date-parts":[[2013,7,5]],"date-time":"2013-07-05T12:28:23Z","timestamp":1373027303000},"page":"122-148","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":199,"title":["Reinforcement Learning in Robotics: Applications and Real-World Challenges"],"prefix":"10.3390","volume":"2","author":[{"given":"Petar","family":"Kormushev","sequence":"first","affiliation":[{"name":"Department of Advanced Robotics, Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy"}]},{"given":"Sylvain","family":"Calinon","sequence":"additional","affiliation":[{"name":"Department of Advanced Robotics, Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6233-9961","authenticated-orcid":false,"given":"Darwin","family":"Caldwell","sequence":"additional","affiliation":[{"name":"Department of Advanced Robotics, Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2013,7,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Siciliano, B., and Khatib, O. (2008). Handbook of Robotics, Springer.","DOI":"10.1007\/978-3-540-30301-5"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1016\/j.robot.2008.10.024","article-title":"A survey of robot learning from demonstration","volume":"57","author":"Argall","year":"2009","journal-title":"Robot. Auton. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1163\/016918611X558261","article-title":"Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input","volume":"25","author":"Kormushev","year":"2011","journal-title":"Adv. Robot."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1007\/s12369-012-0160-0","article-title":"Keyframe-based learning from demonstration","volume":"4","author":"Akgun","year":"2012","journal-title":"Int. J. Soc. Robot."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Nehaniv, C.L., and Dautenhahn, K. (2007). Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions, Cambridge University Press.","DOI":"10.1017\/CBO9780511489808"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kormushev, P., Nenchev, D.N., Calinon, S., and Caldwell, D.G. (2011, January 9\u201313). Upper-Body Kinesthetic Teaching of a Free-Standing Humanoid Robot. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China.","DOI":"10.1109\/ICRA.2011.5979537"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., and Schaal, S. (2011, January 9\u201313). Skill Learning and Task Outcome Prediction for Manipulation. Proceedings of the International Conference on Robotics and Automation (ICRA), Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980200"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Stulp, F., Buchli, J., Theodorou, E., and Schaal, S. (2010, January 6\u20138). Reinforcement Learning of Full-Body Humanoid Motor Skills. Proceedings of the IEEE International Conference on Humanoid Robots (Humanoids), Nashville, TN, USA.","DOI":"10.1109\/ICHR.2010.5686320"},{"key":"ref_10","unstructured":"Peters, J., and Schaal, S. (, January October). Policy Gradient Methods for Robotics. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1145\/1538788.1538812","article-title":"Apprenticeship learning for helicopter control","volume":"52","author":"Coates","year":"2009","journal-title":"Commun. ACM"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1016\/j.robot.2006.03.002","article-title":"Learning at the level of synergies for a robot weightlifter","volume":"54","author":"Rosenstein","year":"2006","journal-title":"Robot. Auton. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Kormushev, P., Calinon, S., and Caldwell, D.G. (2010, January 18\u201322). Robot Motor Skill Coordination with EM-Based Reinforcement Learning. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan.","DOI":"10.1109\/IROS.2010.5649089"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kormushev, P., Calinon, S., Saegusa, R., and Metta, G. (2010, January 6\u20138). Learning the Skill of Archery by a Humanoid Robot iCub. Proceedings of the IEEE International Conference on Humanoid Robots (Humanoids), Nashville, TN, USA.","DOI":"10.1109\/ICHR.2010.5686841"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kormushev, P., Calinon, S., Ugurlu, B., and Caldwell, D.G. (2012, January 10\u201315). Challenges for the Policy Representation When Applying Reinforcement Learning in Robotics. Proceedings of the International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia.","DOI":"10.1109\/IJCNN.2012.6252758"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1180","DOI":"10.1016\/j.neucom.2007.11.026","article-title":"Natural actor-critic","volume":"71","author":"Peters","year":"2008","journal-title":"Neurocomputing"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kober, J., and Peters, J. (2009, January 12\u201317). Learning Motor Primitives for Robotics. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan.","DOI":"10.1109\/ROBOT.2009.5152577"},{"key":"ref_19","first-page":"3137","article-title":"A generalized path integral control approach to reinforcement learning","volume":"11","author":"Theodorou","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_20","unstructured":"Rubinstein, R., and Kroese, D. (2004). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning, Springer-Verlag."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/3-540-32494-1_4","article-title":"The CMA evolution strategy: A comparing review","volume":"Volume 192","author":"Lozano","year":"2006","journal-title":"Towards a New Evolutionary Computation"},{"key":"ref_22","unstructured":"Stulp, F., and Sigaud, O. (July, January 26). Path Integral Policy Improvement with Covariance Matrix Adaptation. Proceedings of the International Conference on Machine Learning (ICML), Edinburgh, UK."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1521","DOI":"10.1163\/156855307782148550","article-title":"Reinforcement learning for imitating constrained reaching movements","volume":"21","author":"Guenter","year":"2007","journal-title":"Adv. Robot."},{"key":"ref_24","first-page":"849","article-title":"Policy search for motor primitives in robotics","volume":"Volume 21","author":"Kober","year":"2009","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1016\/S0079-6123(06)65027-9","article-title":"Dynamics systems vs. optimal control a unifying view","volume":"165","author":"Schaal","year":"2007","journal-title":"Progr. Brain Res."},{"key":"ref_26","unstructured":"Ijspeert, A.J., Nakanishi, J., and Schaal, S. (November, January 29). Trajectory Formation for Imitation with Nonlinear Dynamical Systems. Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Maui, HI, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hoffmann, H., Pastor, P., Park, D.H., and Schaal, S. (2009, January 12\u201317). Biologically-Inspired Dynamical Systems for Movement Generation: Automatic Real-Time Goal Adaptation and Obstacle Avoidance. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan.","DOI":"10.1109\/ROBOT.2009.5152423"},{"key":"ref_28","unstructured":"Kober, J. (2008). Reinforcement Learning for Motor Primitives. [Master\u2019s Thesis, University of Stuttgart]."},{"key":"ref_29","unstructured":"Peters, J., and Schaal, S. (2007, January 25\u201327). Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning. Proceedings of the 15th European Symposium on Artificial Neural Networks (ESANN 2007), Bruges, Belgium."},{"key":"ref_30","unstructured":"Pardo, D. (2009). Learning Rest-to-Rest Motor Coordination in Articulated Mobile Robots. [Ph.D. Thesis, Technical University of Catalonia (UPC)]."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1038\/nn963","article-title":"Optimal feedback control as a theory of motor coordination","volume":"5","author":"Todorov","year":"2002","journal-title":"Nat. Neurosci."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Williams, A.M., and Hodges, N.J. (2004). Skill Acquisition in Sport: Research, Theory and Practice, Routledge.","DOI":"10.4324\/9780203646564"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"7601","DOI":"10.1073\/pnas.0901512106","article-title":"Simplified and effective motor control based on muscle synergies to exploit musculoskeletal dynamics","volume":"106","author":"Bernikera","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Calinon, S., Sardellitti, I., and Caldwell, D.G. (2010, January 18\u201322). Learning-Based Control Strategy for Safe Human-Robot Interaction Exploiting Task and Robot Redundancies. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan.","DOI":"10.1109\/IROS.2010.5648931"},{"key":"ref_35","unstructured":"Calinon, S., Li, Z., Alizadeh, T., Tsagarakis, N.G., and Caldwell, D.G. (December, January 29). Statistical Dynamical Systems for Skills Acquisition in Humanoids. Proceedings of the IEEE International Conference on Humanoid Robots (Humanoids), Osaka, Japan."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/s10994-010-5186-7","article-title":"Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains","volume":"81","author":"Bernstein","year":"2010","journal-title":"Mach. Learn."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/BF00993591","article-title":"The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces","volume":"21","author":"Moore","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Kormushev, P., Ugurlu, B., Calinon, S., Tsagarakis, N., and Caldwell, D.G. (2011, January 25\u201330). Bipedal Walking Energy Minimization by Reinforcement Learning with Evolving Policy Parameterization. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6048037"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ugurlu, B., Tsagarakis, N.G., Spyrakos-Papastravridis, E., and Caldwell, D.G. (2011, January 13\u201315). Compiant Joint Modification and Real-Time Dynamic Walking Implementation on Bipedal Robot cCub. Proceedings of the IEEE International Conference on Mechatronics, Istanbul, Turkey.","DOI":"10.1109\/ICMECH.2011.5971230"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1016\/j.neunet.2003.11.004","article-title":"Reinforcement learning with via-point representation","volume":"17","author":"Miyamoto","year":"2004","journal-title":"Neural Netw."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/MRA.2007.380654","article-title":"Learning biped locomotion: Application of poincare-map-based reinforcement learning","volume":"14","author":"Morimoto","year":"2007","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_42","unstructured":"Wada, Y., and Sumita, K. (2004, January 25\u201329). A Reinforcement Learning Scheme for Acquisition of Via-Point Representation of Human Motion. Proceedings of the IEEE International Conference on Neural Networks, Budapest, Hungary."},{"key":"ref_43","first-page":"66","article-title":"Learning fast quadruped robot gaits with the RL power spline parameterization","volume":"12","author":"Shen","year":"2012","journal-title":"Cybern. Inf. Technol."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1163\/156855307781389419","article-title":"iCub: The design and realization of an open humanoid platform for cognitive and neuroscience research","volume":"21","author":"Tsagarakis","year":"2007","journal-title":"Adv. Robot."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/2\/3\/122\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:47:44Z","timestamp":1760219264000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/2\/3\/122"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,7,5]]},"references-count":44,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2013,9]]}},"alternative-id":["robotics2030122"],"URL":"https:\/\/doi.org\/10.3390\/robotics2030122","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,7,5]]}}}