{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T04:44:06Z","timestamp":1777092246944,"version":"3.51.4"},"reference-count":34,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,4,2]],"date-time":"2022-04-02T00:00:00Z","timestamp":1648857600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>The foundation and emphasis of robotic manipulator control is Inverse Kinematics (IK). Due to the complexity of derivation, difficulty of computation, and redundancy, traditional IK solutions pose numerous challenges to the operation of a variety of robotic manipulators. This paper develops a Deep Reinforcement Learning (RL) approach for solving the IK problem of a 7-Degree of Freedom (DOF) robotic manipulator using Product of Exponentials (PoE) as a Forward Kinematics (FK) computation tool and the Deep Q-Network (DQN) as an IK solver. The selected approach is architecturally simpler, making it faster and easier to implement, as well as more stable, because it is less sensitive to hyperparameters than continuous action spaces algorithms. The algorithm is designed to produce joint-space trajectories from a given end-effector trajectory. Different network architectures were explored and the output of the DQN was implemented experimentally on a Sawyer robotic arm. The DQN was able to find different trajectories corresponding to a specified Cartesian path of the end-effector. The network agent was able to learn random B\u00e9zier and straight-line end-effector trajectories in a reasonable time frame with good accuracy, demonstrating that even though DQN is mainly used in discrete solution spaces, it could be applied to generate joint space trajectories.<\/jats:p>","DOI":"10.3390\/robotics11020044","type":"journal-article","created":{"date-parts":[[2022,4,3]],"date-time":"2022-04-03T06:04:01Z","timestamp":1648965841000},"page":"44","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":54,"title":["A Deep Reinforcement-Learning Approach for Inverse Kinematics Solution of a High Degree of Freedom Robotic Manipulator"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0812-0587","authenticated-orcid":false,"given":"Aryslan","family":"Malik","sequence":"first","affiliation":[{"name":"Aerospace Engineering Department, Embry\u2014Riddle Aeronautical University, Daytona Beach, FL 32114, USA"}]},{"given":"Yevgeniy","family":"Lischuk","sequence":"additional","affiliation":[{"name":"Software\u2014Device OS, Amazon, Austin, TX 78758, USA"}]},{"given":"Troy","family":"Henderson","sequence":"additional","affiliation":[{"name":"Aerospace Engineering Department, Embry\u2014Riddle Aeronautical University, Daytona Beach, FL 32114, USA"}]},{"given":"Richard","family":"Prazenica","sequence":"additional","affiliation":[{"name":"Aerospace Engineering Department, Embry\u2014Riddle Aeronautical University, Daytona Beach, FL 32114, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,2]]},"reference":[{"key":"ref_1","unstructured":"Sridharan, M., and Stone, P. (2007, January 6\u201312). Color Learning on a Mobile Robot: Towards Full Autonomy under Changing Illumination. Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yip, M., and Das, N. (2019). Robot autonomy for surgery. The Encyclopedia of MEDICAL ROBOTICS: Volume 1 Minimally Invasive Surgical Robotics, World Scientific.","DOI":"10.1142\/9789813232266_0010"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Beeson, P., and Ames, B. (2015, January 3\u20135). TRAC-IK: An open-source library for improved solving of generic inverse kinematics. Proceedings of the 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), Seoul, Korea.","DOI":"10.1109\/HUMANOIDS.2015.7363472"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Malik, A., Henderson, T., and Prazenica, R. (2021). Multi-Objective Swarm Intelligence Trajectory Generation for a 7 Degree of Freedom Robotic Manipulator. Robotics, 10.","DOI":"10.3390\/robotics10040127"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Collinsm, T.J., and Shen, W.M. (2017, January 24\u201326). Particle swarm optimization for high-DOF inverse kinematics. Proceedings of the 2017 3rd International Conference on Control, Automation and Robotics (ICCAR), Nagoya, Japan.","DOI":"10.1109\/ICCAR.2017.7942651"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2016). Deep reinforcement learning for robotic manipulation. arXiv.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Malik, A. (2021). Trajectory Generation for a Multibody Robotic System: Modern Methods Based on Product of Exponentials. [Ph.D. Thesis, Embry-Riddle Aeronautical University].","DOI":"10.2514\/6.2021-2016"},{"key":"ref_9","unstructured":"Matheron, G., Perrin, N., and Sigaud, O. (2019). The problem with DDPG: Understanding failures in deterministic environments with sparse rewards. arXiv."},{"key":"ref_10","unstructured":"Nikishin, E., Izmailov, P., Athiwaratkun, B., Podoprikhin, D., Garipov, T., Shvechikov, P., Vetrov, D., and Wilson, A.G. (2018, January 10). Improving stability in deep reinforcement learning with weight averaging. Proceedings of the UAI 2018 Workshop: Uncertainty in Deep Learning, Monterey, CA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Tiong, T., Saad, I., Teo, K.T.K., and bin Lago, H. (2020, January 28). Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient. Proceedings of the 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Kuala Lumpur, Malaysia.","DOI":"10.1109\/ICECIE50279.2020.9309539"},{"key":"ref_12","unstructured":"Zhang, F., Leitner, J., Milford, M., Upcroft, B., and Corke, P. (2015). Towards vision-based deep reinforcement learning for robotic motion control. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sasaki, H., Horiuchi, T., and Kato, S. (2017, January 19\u201322). A study on vision-based mobile robot learning by deep Q-network. Proceedings of the 2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), Kanazawa, Japan.","DOI":"10.23919\/SICE.2017.8105597"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1049\/trit.2020.0024","article-title":"Multi-robot path planning based on a deep reinforcement learning DQN algorithm","volume":"5","author":"Yang","year":"2020","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xin, J., Zhao, H., Liu, D., and Li, M. (2017, January 20\u201322). Application of deep reinforcement learning in mobile robot path planning. Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China.","DOI":"10.1109\/CAC.2017.8244061"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Malik, A., Lischuk, Y., Henderson, T., and Prazenica, R. (2021, January 9\u201311). Generating Constant Screw Axis Trajectories with Quintic Time Scaling for End-Effector Using Artificial Neural Network and Machine Learning. Proceedings of the 2021 IEEE Conference on Control Technology and Applications (CCTA), San Diego, CA, USA.","DOI":"10.1109\/CCTA48906.2021.9658657"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ruan, X., Ren, D., Zhu, X., and Huang, J. (2019, January 3\u20135). Mobile robot navigation based on deep reinforcement learning. Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China.","DOI":"10.1109\/CCDC.2019.8832393"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"104985","DOI":"10.1016\/j.compag.2019.104985","article-title":"Double-DQN based path smoothing and tracking control method for robotic vehicle navigation","volume":"166","author":"Zhang","year":"2019","journal-title":"Comput. Electron. Agric."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Xue, X., Li, Z., Zhang, D., and Yan, Y. (2019, January 12\u201314). A deep reinforcement learning method for mobile robot collision avoidance based on double dqn. Proceedings of the 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada.","DOI":"10.1109\/ISIE.2019.8781522"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Guo, Z., Huang, J., Ren, W., and Wang, C. (2019, January 26\u201328). A reinforcement learning approach for inverse kinematics of arm robot. Proceedings of the 2019 4th International Conference on Robotics, Control and Automation, Guangzhou, China.","DOI":"10.1145\/3351180.3351199"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Phaniteja, S., Dewangan, P., Guhan, P., Sarkar, A., and Krishna, K.M. (2017, January 5\u20138). A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, China.","DOI":"10.1109\/ROBIO.2017.8324682"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhong, J., Wang, T., and Cheng, L. (2021). Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics. Complex Intell. Syst., 1\u201314.","DOI":"10.1007\/s40747-021-00366-1"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1126\/science.153.3731.34","article-title":"Dynamic programming","volume":"153","author":"Bellman","year":"1966","journal-title":"Science"},{"key":"ref_24","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_25","unstructured":"Robotics, R. (2022, January 10). Sawyer. Available online: https:\/\/sdk.rethinkrobotics.com\/intera\/Main_Page."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lynch, K.M., and Park, F.C. (2017). Modern Robotics, Cambridge University Press.","DOI":"10.1017\/9781316661239"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Malik, A., Henderson, T., and Prazenica, R.J. (2021). Trajectory Generation for a Multibody Robotic System using the Product of Exponentials Formulation. AIAA Scitech 2021 Forum, American Institute of Aeronautics and Astronautics.","DOI":"10.2514\/6.2021-2016"},{"key":"ref_28","unstructured":"Korczyk, J.J., Posada, D., Malik, A., and Henderson, T. (2021). Modeling of an On-Orbit Maintenance Robotic Arm Test-Bed. 2021 AAS\/AIAA Astrodynamics Specialist Conference, American Astronautical Society."},{"key":"ref_29","first-page":"3319","article-title":"Using Products of Exponentials to Define (Draw) Orbits and More","volume":"175","author":"Malik","year":"2021","journal-title":"Adv. Astronaut. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_31","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Shi, X., Guo, Z., Huang, J., Shen, Y., and Xia, L. (2020, January 19\u201320). A Distributed Reward Algorithm for Inverse Kinematics of Arm Robot. Proceedings of the 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China.","DOI":"10.1109\/CACRE50138.2020.9230347"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., and Dulac-Arnold, G. (2017). Deep q-learning from demonstrations. arXiv.","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"7619483","DOI":"10.1155\/2019\/7619483","article-title":"A dynamic adjusting reward function method for deep reinforcement learning with adjustable parameters","volume":"2019","author":"Hu","year":"2019","journal-title":"Math. Probl. Eng."}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/2\/44\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:49:10Z","timestamp":1760136550000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/2\/44"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,2]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["robotics11020044"],"URL":"https:\/\/doi.org\/10.3390\/robotics11020044","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,2]]}}}