{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T15:10:07Z","timestamp":1776265807266,"version":"3.50.1"},"reference-count":28,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,2,15]],"date-time":"2022-02-15T00:00:00Z","timestamp":1644883200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Major Project of the New Generation of Artificial Intelligence","award":["2018AAA0102904"],"award-info":[{"award-number":["2018AAA0102904"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51975059"],"award-info":[{"award-number":["51975059"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Redundant manipulators are widely used in fields such as human-robot collaboration due to their good flexibility. To ensure efficiency and safety, the manipulator is required to avoid obstacles while tracking a desired trajectory in many tasks. Conventional methods for obstacle avoidance of redundant manipulators may encounter joint singularity or exceed joint position limits while tracking the desired trajectory. By integrating deep reinforcement learning into the gradient projection method, a reactive obstacle avoidance method for redundant manipulators is proposed. We establish a general DRL framework for obstacle avoidance, and then a reinforcement learning agent is applied to learn motion in the null space of the redundant manipulator Jacobian matrix. The reward function of reinforcement learning is redesigned to handle multiple constraints automatically. Specifically, the manipulability index is introduced into the reward function, and thus the manipulator can maintain high manipulability to avoid joint singularity while executing tasks. To show the effectiveness of the proposed method, the simulation of 4 degrees of planar manipulator freedom is given. Compared with the gradient projection method, the proposed method outperforms in a success rate of obstacles avoidance, average manipulability, and time efficiency.<\/jats:p>","DOI":"10.3390\/e24020279","type":"journal-article","created":{"date-parts":[[2022,2,15]],"date-time":"2022-02-15T22:43:22Z","timestamp":1644965002000},"page":"279","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7338-6045","authenticated-orcid":false,"given":"Yue","family":"Shen","sequence":"first","affiliation":[{"name":"School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"given":"Qingxuan","family":"Jia","sequence":"additional","affiliation":[{"name":"School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"given":"Zeyuan","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"given":"Ruiquan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"given":"Junting","family":"Fei","sequence":"additional","affiliation":[{"name":"School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China"}]},{"given":"Gang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hjorth, S., Lachner, J., Stramigioli, S., Madsen, O., and Chrysostomou, D. (January, January 24). An Energy-Based Approach for the Integration of Collaborative Redundant Robots in Restricted Work Environments. Proceedings of the 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9341561"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"132203","DOI":"10.1007\/s11432-019-2735-6","article-title":"Tracking Control of Redundant Manipulator under Active Remote Center-of-Motion Constraints: An RNN-Based Metaheuristic Approach","volume":"64","author":"Khan","year":"2021","journal-title":"Sci. China Inf. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1016\/j.actaastro.2018.04.052","article-title":"Failure Tolerance Strategy of Space Manipulator for Large Load Carrying Tasks","volume":"148","author":"Chen","year":"2018","journal-title":"Acta Astronaut."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1177\/027836498600500106","article-title":"Real-Time Obstacle Avoidance for Manipulators and Mobile Robots","volume":"5","author":"Khatib","year":"1986","journal-title":"Int. J. Robot. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1729881418799562","DOI":"10.1177\/1729881418799562","article-title":"An Improved Artificial Potential Field Method of Trajectory Planning and Obstacle Avoidance for Redundant Manipulators","volume":"15","author":"Wang","year":"2018","journal-title":"Int. J. Adv. Robot. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1109\/TMMS.1969.299896","article-title":"Resolved Motion Rate Control of Manipulators and Human Prostheses","volume":"10","author":"Whitney","year":"1969","journal-title":"IEEE Trans. Man-Mach. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"(1977). Automatic Supervisory Control of the Configuration and Behavior of Multibody Mechanisms. IEEE Trans. Syst. Man Cybern., 7, 868\u2013871.","DOI":"10.1109\/TSMC.1977.4309644"},{"key":"ref_8","unstructured":"Kucuk, S. (2012). Obstacle Avoidance for Redundant Manipulators as Control Problem. Serial and Parallel Robot Manipulators, IntechOpen. Chapter 11."},{"key":"ref_9","first-page":"475","article-title":"A Weighted Gradient Projection Method for Inverse Kinematics of Redundant Manipulators Considering Multiple Performance Criteria","volume":"64","author":"Wan","year":"2018","journal-title":"Stroj. Vestn. J. Mech. Eng."},{"key":"ref_10","first-page":"6869","article-title":"A Comparison of Damped Least Squares Algorithms for Inverse Kinematics of Robot Manipulators This Work Was Supported by the European Community through TheprojectsROBUST(H2020-690416),EuRoC(FP7-608849), DexROV (H2020-635491) and AEROARMS (H2020-644271)","volume":"50","author":"Natale","year":"2017","journal-title":"IFAC-Pap."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/TRO.2010.2050655","article-title":"General-Weighted Least-Norm Control for Redundant Manipulators","volume":"26","author":"Xiang","year":"2010","journal-title":"IEEE Trans. Robot."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhang, X., Fan, B., Wang, C., and Cheng, X. (2021). An Improved Weighted Gradient Projection Method for Inverse Kinematics of Redundant Surgical Manipulators. Sensors, 21.","DOI":"10.3390\/s21217362"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"78608","DOI":"10.1109\/ACCESS.2020.2990555","article-title":"Novel Method of Obstacle Avoidance Planning for Redundant Sliding Manipulators","volume":"8","author":"Liu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/TRO.2020.3006716","article-title":"Motion Planning Networks: Bridging the Gap Between Learning-Based and Classical Motion Planners","volume":"37","author":"Qureshi","year":"2021","journal-title":"IEEE Trans. Robot."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"47","DOI":"10.3389\/fnbot.2019.00047","article-title":"Deep Recurrent Neural Networks Based Obstacle Avoidance Control for Redundant Manipulators","volume":"13","author":"Xu","year":"2019","journal-title":"Front. Neurorobot."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sangiovanni, B., Rendiniello, A., Incremona, G.P., Ferrara, A., and Piastra, M. (2018, January 12\u201315). Deep Reinforcement Learning for Collision Avoidance of Robotic Manipulators. Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus.","DOI":"10.23919\/ECC.2018.8550363"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kumar, V., Hoeller, D., Sundaralingam, B., Tremblay, J., and Birchfield, S. (October, January 27). Joint Space Control via Deep Reinforcement Learning. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636477"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1007\/s10845-020-01582-1","article-title":"Reinforcement Learning-Based Collision-Free Path Planner for Redundant Robot in Narrow Duct","volume":"32","author":"Hua","year":"2021","journal-title":"J. Intell. Manuf."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-Level Control through Deep Reinforcement Learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_20","unstructured":"Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y. (December, January 29). Policy Gradient Methods for Reinforcement Learning with Function Approximation. Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_21","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2019). Continuous Control with Deep Reinforcement Learning. arXiv."},{"key":"ref_22","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 7\u20139). Trust Region Policy Optimization. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_23","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_24","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_25","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/027836498500400201","article-title":"Manipulability of Robotic Mechanisms","volume":"4","author":"Yoshikawa","year":"1985","journal-title":"Int. J. Robot. Res."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Luo, S., Kasaei, H., and Schomaker, L. (2020, January 19\u201324). Accelerating Reinforcement Learning for Reaching Using Continuous Curriculum Learning. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK.","DOI":"10.1109\/IJCNN48605.2020.9207427"},{"key":"ref_28","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/2\/279\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:20:22Z","timestamp":1760134822000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/2\/279"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,15]]},"references-count":28,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["e24020279"],"URL":"https:\/\/doi.org\/10.3390\/e24020279","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,15]]}}}