{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T12:03:01Z","timestamp":1772798581047,"version":"3.50.1"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:00:00Z","timestamp":1756425600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:00:00Z","timestamp":1756425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["92048205"],"award-info":[{"award-number":["92048205"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["TRR-169"],"award-info":[{"award-number":["TRR-169"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Visual servoing is a fundamental approach for robotic manipulation that relies on visual feedback to precisely control robot motion. Most methods are capable of generating velocity control signals to guide the camera to the desired position and orientation, which often exhibit limitations in dynamic responsiveness and robustness against noise and unmodeled dynamics. This paper presents an innovative acceleration-level position-based visual servoing control framework enhanced by deep reinforcement learning (DRL) integrated with Transformer-based temporal sequence processing. The essence of the method comprises two key elements: First, the controller retains the theoretical approach of position-based visual servoing in its design, ensuring transparency and a guaranteed performance baseline. Second, considering the temporal characteristics of servoing control, a Transformer-based actor-critic architecture within a Proximal Policy Optimization (PPO) reinforcement learning scheme is proposed to improve the learning efficiency and performance. Comprehensive experiments are conducted in both simulation and real robot scenarios. The results reveal that, compared with traditional velocity-level controllers, the proposed method demonstrates superior dynamic characteristics, enhanced tracking performance, and diminished sensitivity to noise in Cartesian space.<\/jats:p>","DOI":"10.1007\/s40747-025-02056-8","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T06:16:01Z","timestamp":1756448161000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Enhancing position-based visual servoing performance through transformer-based acceleration-level reinforcement learning"],"prefix":"10.1007","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0169-8896","authenticated-orcid":false,"given":"Zongdao","family":"Li","sequence":"first","affiliation":[]},{"given":"Wenkai","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Siqin","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Pengfei","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Ye","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Tao","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Qingdu","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"issue":"5","key":"2056_CR1","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1109\/70.538972","volume":"12","author":"S Hutchinson","year":"1996","unstructured":"Hutchinson S, Hager GD, Corke PI (1996) A tutorial on visual servo control. IEEE Trans Robot Autom 12(5):651\u2013670","journal-title":"IEEE Trans Robot Autom"},{"issue":"4","key":"2056_CR2","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/MRA.2006.250573","volume":"13","author":"F Chaumette","year":"2006","unstructured":"Chaumette F, Hutchinson S (2006) Visual servo control. i. basic approaches. IEEE Robot Autom Mag 13(4):82\u201390","journal-title":"IEEE Robot Autom Mag"},{"issue":"5","key":"2056_CR3","doi-asserted-by":"publisher","first-page":"1657","DOI":"10.1109\/TCST.2014.2380175","volume":"23","author":"P Cigliano","year":"2015","unstructured":"Cigliano P, Lippiello V, Ruggiero F, Siciliano B (2015) Robotic ball catching with an eye-in-hand single-camera system. IEEE Trans Control Syst Technol 23(5):1657\u20131671","journal-title":"IEEE Trans Control Syst Technol"},{"issue":"10","key":"2056_CR4","doi-asserted-by":"publisher","first-page":"3016","DOI":"10.1109\/TAC.2018.2793458","volume":"63","author":"X Liang","year":"2018","unstructured":"Liang X, Wang H, Liu Y-H, Chen W, Jing Z (2018) Image-based position control of mobile robots with a completely unknown fixed camera. IEEE Trans Autom Control 63(10):3016\u20133023","journal-title":"IEEE Trans Autom Control"},{"issue":"10","key":"2056_CR5","doi-asserted-by":"publisher","first-page":"4735","DOI":"10.1109\/TIE.2011.2179270","volume":"59","author":"D Park","year":"2012","unstructured":"Park D, Kwon J-H, Ha I-J (2012) Novel position-based visual servoing approach to robust global stability under field-of-view constraint. IEEE Trans Industr Electron 59(10):4735\u20134752","journal-title":"IEEE Trans Industr Electron"},{"issue":"1","key":"2056_CR6","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1109\/TRO.2009.2033332","volume":"26","author":"E Malis","year":"2010","unstructured":"Malis E, Mezouar Y, Rives P (2010) Robustness of image-based visual servoing with a calibrated camera in the presence of uncertainties in the three-dimensional structure. IEEE Trans Rob 26(1):112\u2013120","journal-title":"IEEE Trans Rob"},{"issue":"11","key":"2056_CR7","doi-asserted-by":"publisher","first-page":"5419","DOI":"10.1109\/TNNLS.2018.2802650","volume":"29","author":"Y Zhang","year":"2018","unstructured":"Zhang Y, Li S (2018) A neural controller for image-based visual servoing of manipulators with physical constraints. IEEE Trans Neural Netw Learn Syst 29(11):5419\u20135429","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"1","key":"2056_CR8","first-page":"1334","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334\u20131373","journal-title":"J Mach Learn Res"},{"issue":"11","key":"2056_CR9","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1177\/0278364913495721","volume":"32","author":"J Kober","year":"2013","unstructured":"Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238\u20131274","journal-title":"Int J Robot Res"},{"key":"2056_CR10","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, London"},{"issue":"1","key":"2056_CR11","first-page":"1334","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334\u20131373","journal-title":"J Mach Learn Res"},{"key":"2056_CR12","unstructured":"Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International Conference on Machine Learning. pp 1861\u20131870"},{"key":"2056_CR13","unstructured":"Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V, Levine S (2018) Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, PMLR. pp. 651\u2013673"},{"key":"2056_CR14","doi-asserted-by":"crossref","unstructured":"Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE. pp. 3803\u20133810","DOI":"10.1109\/ICRA.2018.8460528"},{"key":"2056_CR15","doi-asserted-by":"crossref","unstructured":"Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp 737\u2013744","DOI":"10.1109\/SSCI47803.2020.9308468"},{"issue":"26","key":"2056_CR16","doi-asserted-by":"publisher","first-page":"5872","DOI":"10.1126\/scirobotics.aau5872","volume":"4","author":"J Hwangbo","year":"2019","unstructured":"Hwangbo J, Lee J, Hutter M (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot 4(26):5872","journal-title":"Sci Robot"},{"key":"2056_CR17","doi-asserted-by":"crossref","unstructured":"Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). pp 23\u201330","DOI":"10.1109\/IROS.2017.8202133"},{"issue":"1","key":"2056_CR18","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1177\/0278364919887447","volume":"39","author":"M Andrychowicz","year":"2020","unstructured":"Andrychowicz M, Baker B, Chociej M, Jozefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A et al (2020) Learning dexterous in-hand manipulation. Int J Robot Res 39(1):3\u201320","journal-title":"Int J Robot Res"},{"key":"2056_CR19","doi-asserted-by":"publisher","unstructured":"Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, Paino A, Plappert M, Powell G, Ribas R, Schneider J, Tezak N, Tworek J, Welinder P, Weng L, Yuan Q, Zaremba W, Zhang L (2019) Solving rubik\u2019s cube with a robot hand. arXiv preprint arXiv:1910.07113https:\/\/doi.org\/10.48550\/arXiv.1910.07113. https:\/\/arxiv.org\/abs\/1910.07113","DOI":"10.48550\/arXiv.1910.07113"},{"issue":"5","key":"2056_CR20","doi-asserted-by":"publisher","first-page":"2784","DOI":"10.1109\/TCYB.2023.3310505","volume":"54","author":"W Chen","year":"2023","unstructured":"Chen W, Zeng C, Liang H, Sun F, Zhang J (2023) Multimodality driven impedance-based sim2real transfer learning for robotic multiple peg-in-hole assembly. IEEE Trans Cybern 54(5):2784\u20132797","journal-title":"IEEE Trans Cybern"},{"issue":"11","key":"2056_CR21","doi-asserted-by":"publisher","first-page":"5419","DOI":"10.1109\/TNNLS.2018.2802650","volume":"29","author":"Y Zhang","year":"2018","unstructured":"Zhang Y, Li S (2018) A neural controller for image-based visual servoing of manipulators with physical constraints. IEEE Trans Neural Netw Learn Syst 29(11):5419\u20135429","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"12","key":"2056_CR22","doi-asserted-by":"publisher","first-page":"5272","DOI":"10.1109\/TNNLS.2020.2965553","volume":"31","author":"W Li","year":"2020","unstructured":"Li W, Chiu PWY, Li Z (2020) An accelerated finite-time convergent neural network for visual servoing of a flexible surgical endoscope with physical and rcm constraints. IEEE Trans Neural Netw Learn Syst 31(12):5272\u20135284","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"1","key":"2056_CR23","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1109\/LRA.2023.3331894","volume":"9","author":"EG Ribeiro","year":"2024","unstructured":"Ribeiro EG, Mendes RQ, Terra MH, Grassi V (2024) Second-order position-based visual servoing of a robot manipulator. IEEE Robot Autom Lett 9(1):207\u2013214","journal-title":"IEEE Robot Autom Lett"},{"issue":"10","key":"2056_CR24","doi-asserted-by":"publisher","first-page":"5444","DOI":"10.1109\/TIE.2014.2300048","volume":"61","author":"M Keshmiri","year":"2014","unstructured":"Keshmiri M, Xie W-F, Mohebbi A (2014) Augmented image-based visual servoing of a manipulator using acceleration command. IEEE Trans Industr Electron 61(10):5444\u20135452","journal-title":"IEEE Trans Industr Electron"},{"issue":"4","key":"2056_CR25","doi-asserted-by":"publisher","first-page":"5197","DOI":"10.1109\/LRA.2020.3004793","volume":"5","author":"F Fusco","year":"2020","unstructured":"Fusco F, Kermorgant O, Martinet P (2020) Integrating features acceleration in visual predictive control. IEEE Robot Autom Lett 5(4):5197\u20135204","journal-title":"IEEE Robot Autom Lett"},{"issue":"10","key":"2056_CR26","doi-asserted-by":"publisher","first-page":"8214","DOI":"10.1109\/TIE.2018.2881948","volume":"66","author":"A Anwar","year":"2019","unstructured":"Anwar A, Lin W, Deng X, Qiu J, Gao H (2019) Quality inspection of remote radio units using depth-free image-based visual servo with acceleration command. IEEE Trans Industr Electron 66(10):8214\u20138223","journal-title":"IEEE Trans Industr Electron"},{"key":"2056_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.measurement.2020.108137","volume":"166","author":"J Yang","year":"2020","unstructured":"Yang J, Xie Z, Chen L, Liu M (2020) An acceleration-level visual servoing scheme for robot manipulator with iot and sensors using recurrent neural network. Measurement 166:108137","journal-title":"Measurement"},{"issue":"4","key":"2056_CR28","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/MRA.2006.250573","volume":"13","author":"F Chaumette","year":"2006","unstructured":"Chaumette F, Hutchinson S (2006) Visual servo control basic. i. approaches. IEEE Robot Autom Mag 13(4):82\u201390","journal-title":"IEEE Robot Autom Mag"},{"issue":"1","key":"2056_CR29","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1109\/MRA.2007.339609","volume":"14","author":"S Hutchinson","year":"2007","unstructured":"Hutchinson S, Chaumette F (2007) Visual servo control, part ii: Advanced approaches. IEEE Robot Autom Mag 14(1):109\u2013118","journal-title":"IEEE Robot Autom Mag"},{"issue":"2","key":"2056_CR30","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1109\/70.760345","volume":"15","author":"E Malis","year":"1999","unstructured":"Malis E, Chaumette F, Boudet S (1999) 2 1\/2 d visual servoing. IEEE Trans Robot Autom 15(2):238\u2013250","journal-title":"IEEE Trans Robot Autom"},{"key":"2056_CR31","doi-asserted-by":"crossref","unstructured":"Mansard N, Chaumette F (2007) Task sequencing for high-level sensor-based control. IEEE Trans Rob 23(1):60\u201372","DOI":"10.1109\/TRO.2006.889487"},{"key":"2056_CR32","doi-asserted-by":"crossref","unstructured":"Li S, Xie W, Gao Y (2017) Enhanced ibvs controller for a 6dof manipulator using hybrid pd-smc method. In: IECON 2017-43rd Annual Conference of the IEEE Industrial Electronics Society, IEEE. pp. 2852\u20132857","DOI":"10.1109\/IECON.2017.8216481"},{"key":"2056_CR33","doi-asserted-by":"crossref","unstructured":"Siradjuddin I, Behera L, McGinnity TM, Coleman S (2013) Image-based visual servoing of a 7-dof robot manipulator using an adaptive distributed fuzzy pd controller. IEEE\/ASME Trans Mechatron 19(2):512\u2013523","DOI":"10.1109\/TMECH.2013.2245337"},{"issue":"3","key":"2056_CR34","doi-asserted-by":"publisher","first-page":"7239","DOI":"10.3182\/20140824-6-ZA-1003.01742","volume":"47","author":"A Mohebbi","year":"2014","unstructured":"Mohebbi A, Keshmiri M, Xie W-F (2014) An acceleration command approach to robotic stereo image-based visual servoing. IFAC Proc Vol 47(3):7239\u20137245","journal-title":"IFAC Proc Vol"},{"key":"2056_CR35","doi-asserted-by":"publisher","first-page":"1367","DOI":"10.1007\/978-3-030-63416-2_281","volume-title":"Computer vision: a reference guide","author":"F Chaumette","year":"2021","unstructured":"Chaumette F (2021) Visual servoing. Computer vision: a reference guide. Springer, Cham, pp 1367\u20131374"},{"key":"2056_CR36","doi-asserted-by":"crossref","unstructured":"Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2017) Asymmetric actor critic for image-based robot learning. Robotics: Science and Systems, https:\/\/www.roboticsproceedings.org\/rss14\/p08.pdf, MIT Press Journals","DOI":"10.15607\/RSS.2018.XIV.008"},{"issue":"6","key":"2056_CR37","doi-asserted-by":"publisher","first-page":"3434","DOI":"10.1002\/asjc.2769","volume":"24","author":"J Gu","year":"2022","unstructured":"Gu J, Wang W, Li A, Zhu M, Cao L, Xu Z (2022) A review of visual servoing approaches based on deep reinforcement learning. Asian J Control 24(6):3434\u20133450","journal-title":"Asian J Control"},{"key":"2056_CR38","doi-asserted-by":"publisher","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint, https:\/\/doi.org\/10.48550\/arXiv.1707.06347, arXiv:1707.06347","DOI":"10.48550\/arXiv.1707.06347"},{"key":"2056_CR39","doi-asserted-by":"publisher","first-page":"490","DOI":"10.1016\/j.neucom.2018.11.029","volume":"330","author":"X Shi","year":"2019","unstructured":"Shi X, Cheng Y, Yin C, Huang X, Zhong S-M (2019) Design of adaptive backstepping dynamic surface control method with rbf neural network for uncertain nonlinear system. Neurocomputing 330:490\u2013503","journal-title":"Neurocomputing"},{"key":"2056_CR40","first-page":"19884","volume":"33","author":"M Laskin","year":"2020","unstructured":"Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A (2020) Reinforcement learning with augmented data. Adv Neural Inf Process Syst 33:19884\u201319895","journal-title":"Adv Neural Inf Process Syst"},{"issue":"6","key":"2056_CR41","doi-asserted-by":"publisher","first-page":"3434","DOI":"10.1002\/asjc.2769","volume":"24","author":"J Gu","year":"2022","unstructured":"Gu J, Wang W, Li A, Zhu M, Cao L, Xu Z (2022) Homography-based uncalibrated visual servoing with neural-network-assisted robust filtering scheme and adaptive servo gain. Asian J Control 24(6):3434\u20133455. https:\/\/doi.org\/10.1002\/asjc.2769","journal-title":"Asian J Control"},{"key":"2056_CR42","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and Short Papers), pp. 4171\u20134186"},{"key":"2056_CR43","unstructured":"Rudin N, Hoeller D, Reist P, Hutter M (2022) Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, PMLR. pp 91\u2013100"},{"key":"2056_CR44","doi-asserted-by":"crossref","unstructured":"Todorov E, Erez T, Tassa Y (2012) Mujoco: A physics engine for model-based control. In: 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, pp. 5026\u20135033","DOI":"10.1109\/IROS.2012.6386109"},{"key":"2056_CR45","doi-asserted-by":"crossref","unstructured":"Bambade A, El-Kazdadi S, Taylor A, Carpentier J (2022) Prox-qp: Yet another quadratic programming solver for robotics and beyond. In: RSS 2022-Robotics: Science and Systems https:\/\/www.roboticsproceedings.org\/rss18\/p040.pdf, MIT Press Journals","DOI":"10.15607\/RSS.2022.XVIII.040"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02056-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-025-02056-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02056-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T13:32:36Z","timestamp":1758807156000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-025-02056-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,29]]},"references-count":45,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["2056"],"URL":"https:\/\/doi.org\/10.1007\/s40747-025-02056-8","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,29]]},"assertion":[{"value":"18 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 August 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article. Financial or non-financial interests: The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This study does not involve any human participants or animals.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}}],"article-number":"430"}}