{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T13:43:11Z","timestamp":1762177391515,"version":"3.41.2"},"reference-count":29,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,2,9]],"date-time":"2021-02-09T00:00:00Z","timestamp":1612828800000},"content-version":"vor","delay-in-days":39,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1903214","62071339","61671332","U1736206"],"award-info":[{"award-number":["U1903214","62071339","61671332","U1736206"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Complexity"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>This paper suggests an online solution for the optimal tracking control of robotic systems based on a single critic neural network (NN)\u2010based reinforcement learning (RL) method. To this end, we rewrite the robotic system model as a state\u2010space form, which will facilitate the realization of optimal tracking control synthesis. To maintain the tracking response, a steady\u2010state control is designed, and then an adaptive optimal tracking control is used to ensure that the tracking error can achieve convergence in an optimal sense. To solve the obtained optimal control via the framework of adaptive dynamic programming (ADP), the command trajectory to be tracked and the modified tracking Hamilton\u2010Jacobi\u2010Bellman (HJB) are all formulated. An online RL algorithm is the developed to address the HJB equation using a critic NN with online learning algorithm. Simulation results are given to verify the effectiveness of the proposed method.<\/jats:p>","DOI":"10.1155\/2021\/8839391","type":"journal-article","created":{"date-parts":[[2021,2,10]],"date-time":"2021-02-10T01:41:33Z","timestamp":1612921293000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Online Optimal Control of Robotic Systems with Single Critic NN\u2010Based Reinforcement Learning"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7810-2697","authenticated-orcid":false,"given":"Xiaoyi","family":"Long","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7700-0901","authenticated-orcid":false,"given":"Zheng","family":"He","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9796-488X","authenticated-orcid":false,"given":"Zhongyuan","family":"Wang","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,2,9]]},"reference":[{"volume-title":"Reinforcement Learning: An Introduction","year":"1998","author":"Sutton R. S.","key":"e_1_2_9_1_2"},{"volume-title":"Reinforcement Learning and Approximate Dynamic Programming for Feedback Control","year":"2013","author":"Lewis F. L.","key":"e_1_2_9_2_2"},{"key":"e_1_2_9_3_2","doi-asserted-by":"publisher","DOI":"10.1002\/9780470182963"},{"volume-title":"Adaptive Dynamic Programming for Control: Algorithms and Stability","year":"2012","author":"Zhang H.","key":"e_1_2_9_4_2"},{"key":"e_1_2_9_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2012.09.019"},{"key":"e_1_2_9_6_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207179.2015.1060362"},{"key":"e_1_2_9_7_2","doi-asserted-by":"crossref","unstructured":"Chen AnthonyS.andGuidoH. Adaptive optimal control via continuous-time Q-learning for unknown nonlinear affine systems Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC) December 2019 Nice France IEEE 1007\u20131012.","DOI":"10.1109\/CDC40024.2019.9030116"},{"key":"e_1_2_9_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2012.06.008"},{"key":"e_1_2_9_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2004.11.034"},{"key":"e_1_2_9_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2009.03.008"},{"key":"e_1_2_9_11_2","doi-asserted-by":"publisher","DOI":"10.1002\/rnc.3018"},{"key":"e_1_2_9_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2012.06.096"},{"key":"e_1_2_9_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2582849"},{"key":"e_1_2_9_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-1712-5_12"},{"key":"e_1_2_9_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2013.09.043"},{"key":"e_1_2_9_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2014.05.011"},{"key":"e_1_2_9_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isatra.2019.08.025"},{"key":"e_1_2_9_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2014.7004668"},{"key":"e_1_2_9_19_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cta.2015.0590"},{"key":"e_1_2_9_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/tac.2014.2317301"},{"key":"e_1_2_9_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfranklin.2019.07.022"},{"key":"e_1_2_9_22_2","first-page":"399","article-title":"Novel robot manipulator adaptive artificial control: design a novel siso adaptive fuzzy sliding algorithm inverse dynamic like method","volume":"5","author":"Piltan F.","year":"2011","journal-title":"International Journal of Engineering"},{"key":"e_1_2_9_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/37.980247"},{"key":"e_1_2_9_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/87.553662"},{"key":"e_1_2_9_25_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207721.2014.906681"},{"key":"e_1_2_9_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/tase.2013.2296206"},{"key":"e_1_2_9_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.2018.2861826"},{"key":"e_1_2_9_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.02.025"},{"key":"e_1_2_9_29_2","doi-asserted-by":"publisher","DOI":"10.1002\/rnc.3247"}],"container-title":["Complexity"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/8839391.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/8839391.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/8839391","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T21:52:46Z","timestamp":1723240366000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/8839391"}},"subtitle":[],"editor":[{"given":"Jing","family":"Na","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/8839391"],"URL":"https:\/\/doi.org\/10.1155\/2021\/8839391","archive":["Portico"],"relation":{},"ISSN":["1076-2787","1099-0526"],"issn-type":[{"type":"print","value":"1076-2787"},{"type":"electronic","value":"1099-0526"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-08-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"8839391"}}