{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T18:25:20Z","timestamp":1774635920332,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,10,20]],"date-time":"2022-10-20T00:00:00Z","timestamp":1666224000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science and Technology Council, Taiwan","award":["MOST 108-2221-E-011 -166 -MY3"],"award-info":[{"award-number":["MOST 108-2221-E-011 -166 -MY3"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Reinforcement Learning (RL) is gaining much research attention because it allows the system to learn from interacting with the environment. Yet, with all these successful applications, the application of RL in direct joint torque control without the help of an underlining dynamic model is not reported in the literature. This study presents a split network structure that enables successful training of RL to learn the direct torque control for trajectory following a six-axis articulated robot without prior knowledge of the dynamic robot model. The training took a very long time to converge. However, we were able to show the successful control of four different trajectories without needing an accurate dynamics model and complex inverse kinematics computation. To show the RL-based control\u2019s effectiveness, we also compare the RL control with the Model Predictive Control (MPC), another popular trajectory control method. Our results show that while the MPC achieves smoother and more accurate control, it does not automatically treat the singularity. In addition, it requires complex inverse dynamics calculations. On the other hand, the RL controller instinctively avoided the violent action around the singularities.<\/jats:p>","DOI":"10.3390\/robotics11050116","type":"journal-article","created":{"date-parts":[[2022,10,21]],"date-time":"2022-10-21T00:34:30Z","timestamp":1666312470000},"page":"116","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Trajectory Control of An Articulated Robot Based on Direct Reinforcement Learning"],"prefix":"10.3390","volume":"11","author":[{"given":"Chia-Hao","family":"Tsai","sequence":"first","affiliation":[{"name":"Department of Mechanical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106216, Taiwan"}]},{"given":"Jun-Ji","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106216, Taiwan"}]},{"given":"Teng-Feng","family":"Hsieh","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106216, Taiwan"}]},{"given":"Jia-Yush","family":"Yen","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering, National Taiwan University of Science and Technology, No. 43, Sec. 4, Taipei 106335, Taiwan"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Luo, J., Solowjow, E., Wen, C., Ojea, J.A., and Agogino, A.M. (2018, January 1\u20135). Deep Reinforcement Learning for Robotic Assembly of Mixed Deformable and Rigid Objects. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594353"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Singh, A., and Nandi, G.C. (2018, January 26\u201328). Machine Learning based Joint Torque calculations of Industrial Robots. Proceedings of the 2018 Conference on Information and Communication Technology (CICT), Jabalpur, India.","DOI":"10.1109\/INFOCOMTECH.2018.8722353"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1007\/s00170-012-4715-x","article-title":"Propagation of assembly errors in multitasking machines by the homogenous matrix method","volume":"68","author":"Ugalde","year":"2013","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1016\/S0166-3615(97)00015-8","article-title":"Learning the peg-into-hole assembly operation with a connectionist reinforcement technique","volume":"33","author":"Nuttin","year":"1997","journal-title":"Comput. Ind."},{"key":"ref_5","unstructured":"Schaal, S. (1997). Learning from demonstration. Advances in Neural Information Processing Systems, Morgan Kaufmann Publishers Inc."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Peters, J., and Schaal, S. (2006, January 9\u201315). Policy gradient methods for robotics. Proceedings of the 2006 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Beijing, China.","DOI":"10.1109\/IROS.2006.282564"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Peters, J., and Schaal, S. (2007, January 20\u201324). Reinforcement learning by reward-weighted regression for operational space control. Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA.","DOI":"10.1145\/1273496.1273590"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kober, J., Oztop, E., and Peters, J. (2011, January 16\u201322). Reinforcement learning to adjust robot movements to new situations. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain.","DOI":"10.7551\/mitpress\/9123.003.0009"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"9355","DOI":"10.1109\/TIE.2017.2711551","article-title":"An Efficient Insertion Control Method for Precision Assembly of Cylindrical Components","volume":"64","author":"Liu","year":"2017","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_11","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic policy gradient algorithms. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"122","DOI":"10.3390\/robotics2030122","article-title":"Reinforcement Learning in Robotics: Applications and Real-World Challenges","volume":"2","author":"Kormushev","year":"2013","journal-title":"Robotics"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1726","DOI":"10.1631\/FITEE.1900533","article-title":"Deep reinforcement learning: A survey","volume":"21","author":"Wang","year":"2020","journal-title":"Front. Inf. Technol. Electron. Eng."},{"key":"ref_14","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016, January 2\u20134). Continuous control with deep reinforcement learning. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016\u2014Conference Track Proceedings, San Juan, Puerto Rico."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep reinforcement learning with double Q-Learning. Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016), Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_16","unstructured":"Ansehel, O., Baram, N., and Shimkin, N. (2017, January 6\u201311). Averaged-DQN: Variance reduction and stabilization for Deep Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lee, D., Defourny, B., and Powell, W.B. (2013, January 16\u201319). Bias-corrected Q-learning to control max-operator bias in Q-learning. Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Singapore.","DOI":"10.1109\/ADPRL.2013.6614994"},{"key":"ref_18","unstructured":"He, F.S., Liu, Y., Schwing, A.G., and Peng, J. (2017, January 24\u201326). Learning to play in a day: Faster deep reinforcement learning by optimality tightening. Proceedings of the 5th International Conference on Learning Representations, ICLR 2017\u2014Conference Track Proceedings, Toulon, France."},{"key":"ref_19","unstructured":"Nachum, O., Norouzi, M., Tucker, G., and Schuurmans, D. (2018, January 10\u201315). Smoothed action value functions for learning Gaussian policies. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"De Asis, K., Hernandez-Garcia, J., Holland, G., and Sutton, R. (2018, January 2\u20137). Multi-step reinforcement learning: A unifying algorithm. Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11631"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/TNNLS.2019.2919338","article-title":"An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning","volume":"31","author":"Wunsch","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_22","unstructured":"Wang, D., and Hu, M. (2021). Deep Deterministic Policy Gradient With Compatible Critic Network. IEEE Trans. Neural Netw. Learn. Syst., 1\u201313."},{"key":"ref_23","unstructured":"Ghavamzadeh, M., and Mahadevan, S. (July, January 28). Continuous-Time Hierarchical Reinforcement Learning. Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, USA."},{"key":"ref_24","unstructured":"Tiganj, Z., Shankar, K.H., and Howard, M.W. (1997, January 24\u201325). Scale invariant value computation for reinforcement learning in continuous time. Proceedings of the AAAI Spring Symposium, Palo Alto, CA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Jiao, Z., and Oh, J. (2020). A Real-Time Actor-Critic Architecture for Continuous Control. Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices, Springer International Publishing.","DOI":"10.1007\/978-3-030-55789-8_47"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Chen, S., and Wen, J. (2021). Industrial Robot Trajectory Tracking Control Using Multi-Layer Neural Networks Trained by Iterative Learning Control. Robotics, 10.","DOI":"10.3390\/robotics10010050"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1080\/00207178308932998","article-title":"Stabilizing state-feedback design via the moving horizon method","volume":"37","author":"Kwon","year":"1983","journal-title":"Int. J. Control"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1109\/TIE.2013.2258292","article-title":"Using Neural Network Model Predictive Control for Controlling Shape Memory Alloy-Based Manipulator","volume":"61","author":"Nikdel","year":"2014","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1016\/0005-1098(89)90002-2","article-title":"Model predictive control: Theory and practice\u2014A survey","volume":"25","author":"Prett","year":"1989","journal-title":"Automatica"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"981","DOI":"10.3182\/20140824-6-ZA-1003.01631","article-title":"Model Predictive Control for Power System Frequency Control Taking into Account Imbalance Uncertainty","volume":"47","author":"Ersdal","year":"2014","journal-title":"IFAC Proc. Vol."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chen, J., Tang, C., Xin, L., Li, S.E., and Tomizuka, M. (2018, January 26\u201330). Continuous Decision Making for On-road Autonomous Driving under Uncertain and Interactive Environments. Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China.","DOI":"10.1109\/IVS.2018.8500605"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tang, Q., Chu, Z., Qiang, Y., Wu, S., and Zhou, Z. (2020, January 22\u201326). Trajectory Tracking of Robotic Manipulators with Constraints Based on Model Predictive Control. Proceedings of the 2020 17th International Conference on Ubiquitous Robots (UR), Kyoto, Japan.","DOI":"10.1109\/UR49135.2020.9144943"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lu, C., Wang, K., and Xu, H. (2020, January 27\u201329). Trajectory Tracking of Manipulators Based on Improved Robust Nonlinear Predictive Control. Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System, Xiamen, China.","DOI":"10.1145\/3437802.3437804"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"62380","DOI":"10.1109\/ACCESS.2021.3074741","article-title":"Practical Model Predictive Control for a Class of Nonlinear Systems Using Linear Parameter-Varying Representations","volume":"9","author":"Abbas","year":"2021","journal-title":"IEEE Access"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/MRA.2016.2580591","article-title":"A New Soft Robot Control Method: Using Model Predictive Control for a Pneumatically Actuated Humanoid","volume":"23","author":"Best","year":"2016","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lunni, D., Santamaria-Navarro, A., Rossi, R., Rocco, P., Bascetta, L., and Andrade-Cetto, J. (2017, January 13\u201316). Nonlinear model predictive control for aerial manipulation. Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA.","DOI":"10.1109\/ICUAS.2017.7991347"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Guechi, E.-H., Bouzoualegh, S., Zennir, Y., and Bla\u017ei\u010d, S. (2018). MPC Control and LQ Optimal Control of A Two-Link Robot Arm: A Comparative Study. Machines, 6.","DOI":"10.3390\/machines6030037"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Guechi, E.H., Bouzoualegh, S., Messikh, L., and Bla\u017eic, S. (2018, January 22\u201325). Model predictive control of a two-link robot arm. Proceedings of the 2018 International Conference on Advanced Systems and Electric Technologies (IC_ASET), Hammamet, Tunisia.","DOI":"10.1109\/ASET.2018.8379891"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Car, M., Ivanovic, A., Orsag, M., and Bogdan, S. (2018, January 1\u20135). Impedance Based Force Control for Aerial Robot Peg-in-Hole Insertion Tasks. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593808"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/5\/116\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:58:27Z","timestamp":1760144307000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/5\/116"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,20]]},"references-count":39,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["robotics11050116"],"URL":"https:\/\/doi.org\/10.3390\/robotics11050116","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,20]]}}}