{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T14:14:02Z","timestamp":1776953642713,"version":"3.51.4"},"reference-count":77,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2021,1,24]],"date-time":"2021-01-24T00:00:00Z","timestamp":1611446400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Deep learning has provided new ways of manipulating, processing and analyzing data. It sometimes may achieve results comparable to, or surpassing human expert performance, and has become a source of inspiration in the era of artificial intelligence. Another subfield of machine learning named reinforcement learning, tries to find an optimal behavior strategy through interactions with the environment. Combining deep learning and reinforcement learning permits resolving critical issues relative to the dimensionality and scalability of data in tasks with sparse reward signals, such as robotic manipulation and control tasks, that neither method permits resolving when applied on its own. In this paper, we present recent significant progress of deep reinforcement learning algorithms, which try to tackle the problems for the application in the domain of robotic manipulation control, such as sample efficiency and generalization. Despite these continuous improvements, currently, the challenges of learning robust and versatile manipulation skills for robots with deep reinforcement learning are still far from being resolved for real-world applications.<\/jats:p>","DOI":"10.3390\/robotics10010022","type":"journal-article","created":{"date-parts":[[2021,1,25]],"date-time":"2021-01-25T12:28:31Z","timestamp":1611577711000},"page":"22","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":159,"title":["Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7804-1337","authenticated-orcid":false,"given":"Rongrong","family":"Liu","sequence":"first","affiliation":[{"name":"ICube Lab, Robotics Department, Strasbourg University, UMR 7357 CNRS, 67085 Strasbourg, France"}]},{"given":"Florent","family":"Nageotte","sequence":"additional","affiliation":[{"name":"ICube Lab, Robotics Department, Strasbourg University, UMR 7357 CNRS, 67085 Strasbourg, France"}]},{"given":"Philippe","family":"Zanne","sequence":"additional","affiliation":[{"name":"ICube Lab, Robotics Department, Strasbourg University, UMR 7357 CNRS, 67085 Strasbourg, France"}]},{"given":"Michel","family":"de Mathelin","sequence":"additional","affiliation":[{"name":"ICube Lab, Robotics Department, Strasbourg University, UMR 7357 CNRS, 67085 Strasbourg, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2860-6472","authenticated-orcid":false,"given":"Birgitta","family":"Dresp-Langley","sequence":"additional","affiliation":[{"name":"ICube Lab, UMR 7357, Centre National de la Recherche Scientifique CNRS, 67085 Strasbourg, France"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Dresp-Langley, B., Nageotte, F., Zanne, P., and Mathelin, M.D. (2020). Correlating grip force signals from multiple sensors highlights prehensile control strategies in a complex task-user system. Bioengineering, 7.","DOI":"10.20944\/preprints202010.0328.v1"},{"key":"ref_2","unstructured":"Eranki, V.K.P., and Reddy Gurudu, R. (2016). Design and Structural Analysis of a Robotic Arm. [Master\u2019s Thesis, Blekinge Institute of Technology]."},{"key":"ref_3","unstructured":"Christ, R.D., and Wernli, R.L. (2013). The ROV Manual: A User Guide for Remotely Operated Vehicles, Butterworth-Heinemann. [2nd ed.]."},{"key":"ref_4","unstructured":"Marghitu, D.B. (2001). Mechanical Engineer\u2019s Handbook, Academic Press."},{"key":"ref_5","first-page":"1407","article-title":"Design of control system for articulated robot using leap motion sensor","volume":"3","author":"Savatekar","year":"2016","journal-title":"Int. Res. J. Eng. Technol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"101640","DOI":"10.1016\/j.bspc.2019.101640","article-title":"Robotic arm controlling based on a spiking neural circuit and synaptic plasticity","volume":"55","author":"Wei","year":"2020","journal-title":"Biomed. Signal Process. Control"},{"key":"ref_7","first-page":"1729881417738103","article-title":"Navigation control and stability investigation of a mobile robot based on a hexacopter equipped with an integrated manipulator","volume":"14","author":"Ibrahim","year":"2017","journal-title":"Int. J. Adv. Robot. Syst."},{"key":"ref_8","unstructured":"Safdar, B. (2015). Theory of Robotics Arm Control with PLC, Saimaa University of Applied Sciences."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/BF02478291","article-title":"How we know universals the perception of auditory and visual forms","volume":"9","author":"Pitts","year":"1947","journal-title":"Bull. Math. Biophys."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1109\/JRPROC.1960.287598","article-title":"Perceptron simulation experiments","volume":"48","author":"Rosenblatt","year":"1960","journal-title":"Proc. IRE"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1985). Learning Internal Representations by Error Propagation, California Univ San Diego La Jolla Inst for Cognitive Science.","DOI":"10.21236\/ADA164453"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_13","unstructured":"Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., and LeCun, Y. (October, January 29). What is the best multi-stage architecture for object recognition. Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_14","unstructured":"Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., and Schmidhuber, J. (2011, January 6\u201312). Flexible, high performance convolutional neural networks for image classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain."},{"key":"ref_15","unstructured":"Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv."},{"key":"ref_16","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv."},{"key":"ref_17","unstructured":"Liu, R. (2020). Multispectral Images-Based Background Subtraction Using Codebook and Deep Learning Approaches. [Ph.D. Thesis, Universit\u00e9 Bourgogne Franche-Comt\u00e9]."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_19","first-page":"1334","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement learning: A survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1177\/0278364913495721","article-title":"Reinforcement learning in robotics: A survey","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_22","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Dresp-Langley, B., Ekseth, O.K., Fesl, J., Gohshi, S., Kurz, M., and Sehring, H.W. (2019). Occam\u2019s Razor for Big Data? On detecting quality in large unstructured datasets. Appl. Sci., 9.","DOI":"10.3390\/app9153065"},{"key":"ref_24","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_27","unstructured":"Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., and Graepel, T. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1126\/science.aay2400","article-title":"Superhuman AI for multiplayer poker","volume":"365","author":"Brown","year":"2019","journal-title":"Science"},{"key":"ref_29","unstructured":"Berner, C., Brockman, G., Chan, B., Cheung, V., D\u0119biak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., and Hesse, C. (2019). Dota 2 with large scale deep reinforcement learning. arXiv."},{"key":"ref_30","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the IEEE international conference on robotics and automation (ICRA), Singapore."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sharma, A.R., and Kaushik, P. (2017, January 5\u20136). Literature survey of statistical, deep and reinforcement learning in natural language processing. Proceedings of the International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India.","DOI":"10.1109\/CCAA.2017.8229841"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yun, S., Choi, J., Yoo, Y., Yun, K., and Young Choi, J. (2017, January 21\u201326). Action-decision networks for visual tracking with deep reinforcement learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.148"},{"key":"ref_33","unstructured":"Farazi, N.P., Ahamed, T., Barua, L., and Zou, B. (2015). Deep Reinforcement Learning and Transportation Research: A Comprehensive Review. arXiv."},{"key":"ref_34","unstructured":"Mosavi, A., Ghamisi, P., Faghan, Y., and Duan, P. (2015). Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, Y., Logan, B., Liu, N., Xu, Z., Tang, J., and Wang, Y. (2017, January 23\u201326). Deep reinforcement learning for dynamic treatment regimes on medical registry data. Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI), Park City, UT, USA.","DOI":"10.1109\/ICHI.2017.45"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_37","unstructured":"Bellman, R.E., and Dreyfus, S.E. (2015). Applied Dynamic Programming, Princeton University Press."},{"key":"ref_38","first-page":"1052","article-title":"Stable fitted reinforcement learning","volume":"8","author":"Gordon","year":"1995","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Riedmiller, M. (2005, January 3\u20137). Neural fitted Q iteration\u2014First experiences with a data efficient neural reinforcement learning method. Proceedings of the European Conference on Machine Learning, Porto, Portugal.","DOI":"10.1007\/11564096_32"},{"key":"ref_40","unstructured":"Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep reinforcement learning with double Q-Learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA."},{"key":"ref_41","unstructured":"Bellemare, M.G., Dabney, W., and Munos, R. (2017, January 6\u201311). A distributional perspective on reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Dabney, W., Rowl, M., Bellemare, M.G., and Munos, R. (2018, January 2\u20137). Distributional reinforcement learning with quantile regression. Proceedings of the 32th AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11791"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., and Silver, D. (2017). Rainbow: Combining improvements in deep reinforcement learning. arXiv.","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"ref_44","unstructured":"Salimans, T., Ho, J., Chen, X., Sidor, S., and Sutskever, I. (2017). Evolution strategies as a scalable alternative to reinforcement learning. arXiv."},{"key":"ref_45","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic policy gradient algorithms. Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China."},{"key":"ref_46","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_47","unstructured":"O\u2019Donoghue, B., Munos, R., Kavukcuoglu, K., and Mnih, V. (2016). Combining policy gradient and Q-learning. arXiv."},{"key":"ref_48","first-page":"2863","article-title":"Action-conditional video prediction using deep networks in atari games","volume":"28","author":"Oh","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_49","unstructured":"Nagab, I.A., Kahn, G., Fearing, R.S., and Levine, S. (2018, January 21\u201325). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia."},{"key":"ref_50","unstructured":"Silver, D., Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G., Reichert, D., Rabinowitz, N., and Barreto, A. (2017, January 6\u201311). The predictron: End-to-end learning and planning. Proceedings of the International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_51","first-page":"2154","article-title":"Value iteration networks","volume":"29","author":"Tamar","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_52","unstructured":"Fran\u00e7ois-Lavet, V., Bengio, Y., Precup, D., and Pineau, J. (February, January 27). Combined reinforcement learning via abstract representations. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Fran\u00e7ois-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., and Pineau, J. (2018). An introduction to deep reinforcement learning. arXiv.","DOI":"10.1561\/9781680835397"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1016\/j.neunet.2019.08.014","article-title":"The quantization error in a Self-Organizing Map as a contrast and colour specific indicator of single-pixel change in large random patterns","volume":"119","author":"Wandeto","year":"2019","journal-title":"Neural Netw."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"100433","DOI":"10.1016\/j.imu.2020.100433","article-title":"Pixel precise unsupervised detection of viral particle proliferation in cellular imaging data","volume":"20","author":"Wandeto","year":"2020","journal-title":"Inform. Med. Unlocked"},{"key":"ref_56","unstructured":"Anthony, M., and Bartlett, P.L. (2009). Neural Network Learning: Theoretical Foundations, Cambridge University Press."},{"key":"ref_57","unstructured":"Kakade, S.M. (2003). On the Sample Complexity of Reinforcement Learning. [Ph.D. Thesis, University of London]."},{"key":"ref_58","unstructured":"Sergey, L., Wagener, N., and Abbeel, P. (2015, January 30). Learning contact-rich manipulation skills with guided policy search. Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA."},{"key":"ref_59","unstructured":"(2021, January 22). Learning Contact-Rich Manipulation Skills with Guided Policy Search. Available online: http:\/\/rll.berkeley.edu\/icra2015gps\/."},{"key":"ref_60","first-page":"5048","article-title":"Hindsight experience replay","volume":"30","author":"Andrychowicz","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_61","unstructured":"(2021, January 22). Hindsight Experience Replay. Available online: https:\/\/goo.gl\/SMrQnI."},{"key":"ref_62","unstructured":"Tai, L., Zhang, J., Liu, M., and Burgard, W. (2016). A survey of deep network solutions for learning control in robotics: From reinforcement to imitation. arXiv."},{"key":"ref_63","unstructured":"Bagnell, A.J. (2015). An Invitation to Imitation, Robotics Institute, Carnegie Mellon University. Technical Report."},{"key":"ref_64","unstructured":"Vecerik, M., Hester, T., Scholz, J., Wang, F., Pietquin, O., Piot, B., Heess, N., Roth\u00f6rl, T., Lampe, T., and Riedmiller, M. (2017). Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv."},{"key":"ref_65","unstructured":"(2021, January 22). Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. Available online: https:\/\/www.youtube.com\/watch?v=TyOooJC$_$bLY."},{"key":"ref_66","unstructured":"Ho, J., and Ermon, S. (2016). Generative adversarial imitation learning. arXiv."},{"key":"ref_67","unstructured":"Hausman, K., Chebotar, Y., Schaal, S., Sukhatme, G., and Lim, J.J. (2017, January 4\u20139). Multi-modal imitation learning from unstructured demonstrations using generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_68","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_69","unstructured":"(2021, January 22). Multi-modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets. Available online: https:\/\/sites.google.com\/view\/nips17intentiongan."},{"key":"ref_70","unstructured":"Spector, B., and Belongie, S. (2018). Sample-efficient reinforcement learning through transfer and architectural priors. arXiv."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1177\/0278364917710318","article-title":"Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection","volume":"37","author":"Levine","year":"2018","journal-title":"Int. J. Robot. Res."},{"key":"ref_72","unstructured":"(2021, January 22). Learning Hand-Eye Coordination for Robotic Grasping. Available online: https:\/\/youtu.be\/cXaic$_$k80uM."},{"key":"ref_73","unstructured":"Thrun, S., and Pratt, L. (2012). Learning to Learn, Springer Science & Business Media."},{"key":"ref_74","unstructured":"Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv."},{"key":"ref_75","unstructured":"Finn, C., Yu, T., Zhang, T., Abbeel, P., and Levine, S. (2017). One-shot visual imitation learning via meta-learning. arXiv."},{"key":"ref_76","unstructured":"(2021, January 22). One-Shot Visual Imitation Learning via Meta-Learning. Available online: https:\/\/sites.google.com\/view\/one-shot-imitation."},{"key":"ref_77","unstructured":"Hanna, J.P., Thomas, P.S., Stone, P., and Niekum, S. (2017). Data-efficient policy evaluation through behavior policy search. arXiv."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/10\/1\/22\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:14:37Z","timestamp":1760159677000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/10\/1\/22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,24]]},"references-count":77,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["robotics10010022"],"URL":"https:\/\/doi.org\/10.3390\/robotics10010022","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,24]]}}}