{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T18:16:06Z","timestamp":1780424166408,"version":"3.54.1"},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T00:00:00Z","timestamp":1710460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Traditional trajectory learning methods based on Imitation Learning (IL) only learn the existing trajectory knowledge from human demonstration. In this way, it can not adapt the trajectory knowledge to the task environment by interacting with the environment and fine-tuning the policy. To address this problem, a global trajectory learning method which combinines IL with Reinforcement Learning (RL) to adapt the knowledge policy to the environment is proposed. In this paper, IL is proposed to acquire basic trajectory skills, and then learns the agent will explore and exploit more policy which is applicable to the current environment by RL. The basic trajectory skills include the knowledge policy and the time stage information in the whole task space to help learn the time series of the trajectory, and are used to guide the subsequent RL process. Notably, neural networks are not used to model the action policy and the Q value of RL during the RL process. Instead, they are sampled and updated in the whole task space and then transferred to the networks after the RL process through Behavior Cloning (BC) to get continuous and smooth global trajectory policy. The feasibility and the effectiveness of the method was validated in a custom Gym environment of a flower drawing task. And then, we executed the learned policy in the real-world robot drawing experiment.<\/jats:p>","DOI":"10.3389\/fnbot.2024.1368243","type":"journal-article","created":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T04:33:50Z","timestamp":1710477230000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Human skill knowledge guided global trajectory policy reinforcement learning method"],"prefix":"10.3389","volume":"18","author":[{"given":"Yajing","family":"Zang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pengfei","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fusheng","family":"Zha","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wei","family":"Guo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chuanfeng","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lining","family":"Sun","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2024,3,15]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"429","DOI":"10.3390\/rs15020429","article-title":"Energy-efficient multi-uavs cooperative trajectory optimization for communication coverage: an MADRL approach","volume":"15","author":"Ao","year":"2023","journal-title":"Rem. Sens"},{"key":"B2","doi-asserted-by":"publisher","first-page":"3326","DOI":"10.1109\/LRA.2023.3266720","article-title":"Learning needle pick-and-place without expert demonstrations","volume":"8","author":"Bendikas","year":"2023","journal-title":"IEEE Robot. Autom. Lett"},{"key":"B3","doi-asserted-by":"publisher","first-page":"2874","DOI":"10.1109\/TITS.2022.3227738","article-title":"Modeling human driving behavior through generative adversarial imitation learning","volume":"24","author":"Bhattacharyya","year":"2023","journal-title":"IEEE Trans. Intell. Transpor. Syst"},{"key":"B4","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1108\/AA-11-2018-0188","article-title":"An extended dmp framework for robot learning and improving variable stiffness manipulation","volume":"40","author":"Bian","year":"2020","journal-title":"Assembly Autom"},{"key":"B5","doi-asserted-by":"publisher","first-page":"7863","DOI":"10.1109\/TNNLS.2021.3088947","article-title":"Complex robotic manipulation via graph-based hindsight goal generation","volume":"33","author":"Bing","year":"","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"B6","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1109\/MRA.2022.3204237","article-title":"Simulation to real: learning energy-efficient slithering gaits for a snake-like robot","volume":"29","author":"Bing","year":"","journal-title":"IEEE Robot. Autom. Magaz."},{"key":"B7","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2023.3270298","article-title":"Meta-reinforcement learning in nonstationary and nonparametric environments","author":"Bing","year":"","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"B8","doi-asserted-by":"publisher","first-page":"3476","DOI":"10.1109\/TPAMI.2022.3185549","article-title":"Meta-reinforcement learning in non-stationary and dynamic environments","volume":"45","author":"Bing","year":"","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"B9","doi-asserted-by":"publisher","first-page":"eadg7165","DOI":"10.1126\/scirobotics.adg7165","article-title":"Lateral flexion of a compliant spine improves motor performance in a bioinspired mouse robot","volume":"8","author":"Bing","year":"","journal-title":"Sci. Robot."},{"key":"B10","doi-asserted-by":"publisher","first-page":"2759","DOI":"10.1109\/TIE.2022.3172754","article-title":"Solving robotic manipulation with sparse reward reinforcement learning via graph-based diversity and proximity","volume":"70","author":"Bing","year":"","journal-title":"IEEE Trans. Indus. Electron."},{"key":"B11","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1145\/3243064.3243067","article-title":"Combining deep reinforcement learning with prior knowledge and reasoning","volume":"18","author":"Bougie","year":"2018","journal-title":"SIGAPP Appl. Comput. Rev"},{"key":"B12","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1016\/j.rcim.2018.12.007","article-title":"Trajectory generation for robotic assembly operations using learning by demonstration","volume":"57","author":"Duque","year":"2019","journal-title":"Robot. Comput. Integr. Manufact"},{"key":"B13","doi-asserted-by":"publisher","first-page":"103864","DOI":"10.1016\/j.robot.2021.103864","article-title":"Ring gaussian mixture modelling and regression for collaborative robots","volume":"145","author":"El Zaatari","year":"2021","journal-title":"Robot. Autonom. Syst"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341038","article-title":"\u201cLearning human navigation behavior using measured human trajectories in crowded spaces,\u201d","author":"Fahad","year":"2020","journal-title":"2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)"},{"key":"B15","doi-asserted-by":"publisher","first-page":"2276","DOI":"10.1109\/TRO.2021.3127108","article-title":"Constrained probabilistic movement primitives for robot trajectory adaptation","volume":"38","author":"Frank","year":"2022","journal-title":"IEEE Trans. Robot"},{"key":"B16","doi-asserted-by":"publisher","first-page":"4287","DOI":"10.1007\/s40747-022-00948-7","article-title":"Dm-dqn: dueling munchausen deep q network for robot path planning","volume":"9","author":"Gu","year":"2023","journal-title":"Complex Intell. Syst"},{"key":"B17","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAR49639.2020.9108072","article-title":"\u201cImitation learning for high precision peg-in-hole tasks,\u201d","author":"Gubbi","year":"2020","journal-title":"2020 6th International Conference on Control, Automation and Robotics (ICCAR)"},{"key":"B18","doi-asserted-by":"publisher","first-page":"106613","DOI":"10.1016\/j.engappai.2023.106613","article-title":"Optimal navigation for agvs: a soft actor-critic-based reinforcement learning approach with composite auxiliary rewards","volume":"124","author":"Guo","year":"2023","journal-title":"Eng. Applic. Artif. Intell"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2015.7139642","article-title":"\u201cInverse reinforcement learning of behavioral models for online-adapting navigation strategies,\u201d","author":"Herman","year":"2015","journal-title":"2015 IEEE International Conference on Robotics and Automation (ICRA)"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793867","article-title":"\u201cInverse reinforcement learning of interaction dynamics from demonstrations,\u201d","author":"Hussein","year":"2019","journal-title":"2019 International Conference on Robotics and Automation (ICRA)"},{"key":"B21","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1109\/MRA.2023.3262461","article-title":"Mastering the complex assembly task with a dual-arm robot: A novel reinforcement learning method","volume":"30","author":"Jiang","year":"2023","journal-title":"IEEE Robot. Autom. Magaz"},{"key":"B22","doi-asserted-by":"publisher","first-page":"101863","DOI":"10.1016\/j.rcim.2019.101863","article-title":"Reinforcement learning based on movement primitives for contact tasks","volume":"62","author":"Kim","year":"2020","journal-title":"Robot. Comput. Integr. Manuf"},{"key":"B23","doi-asserted-by":"publisher","first-page":"3719","DOI":"10.1109\/LRA.2019.2928760","article-title":"Learning intention aware online adaptation of movement primitives","volume":"4","author":"Koert","year":"2019","journal-title":"IEEE Robot. Autom. Lett"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1025","DOI":"10.1109\/TIV.2022.3198678","article-title":"Driver behavioral cloning for route following in autonomous vehicles using task knowledge distillation","volume":"8","author":"Li","year":"","journal-title":"IEEE Trans. Intell. Vehicles"},{"key":"B25","doi-asserted-by":"publisher","first-page":"2325","DOI":"10.1109\/LRA.2023.3248443","article-title":"Prodmp: a unified perspective on dynamic and probabilistic movement primitives","volume":"8","author":"Li","year":"","journal-title":"IEEE Robot. Autom. Lett"},{"key":"B26","doi-asserted-by":"publisher","first-page":"1149","DOI":"10.1109\/TMECH.2022.3212707","article-title":"Dynamic skill learning from human demonstration based on the human arm stiffness estimation model and Riemannian DMP","volume":"28","author":"Liao","year":"2023","journal-title":"IEEE\/ASME Trans. Mechatr"},{"key":"B27","doi-asserted-by":"publisher","first-page":"4492","DOI":"10.1109\/TII.2020.3020065","article-title":"Efficient insertion control for precision assembly based on demonstration learning and reinforcement learning","volume":"17","author":"Ma","year":"2021","journal-title":"IEEE Trans. Industr. Inform"},{"key":"B28","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1007\/s00422-014-0599-1","article-title":"Learning strategies in table tennis using inverse reinforcement learning","volume":"108","author":"Muelling","year":"2014","journal-title":"Biol. Cybern"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2015.7353496","article-title":"\u201cLearning optimal controllers in human-robot cooperative transportation tasks with position and force constraints,\u201d","author":"Rozo","year":"2015","journal-title":"2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793911","article-title":"\u201cDmp based trajectory tracking for a nonholonomic mobile robot with automatic goal adaptation and obstacle avoidance,\u201d","author":"Sharma","year":"2019","journal-title":"2019 International Conference on Robotics and Automation (ICRA)."},{"key":"B31","first-page":"447","article-title":"\u201cExploration from demonstration for interactive reinforcement learning,\u201d","volume-title":"Proceedings of the 2016 International Conference on Autonomous Agents &Multiagent Systems, AAMAS '16","author":"Subramanian","year":"2016"},{"key":"B32","first-page":"617","article-title":"\u201cIntegrating reinforcement learning with human demonstrations of varying ability,\u201d","author":"Taylor","year":"2011","journal-title":"The 10th International Conference on Autonomous Agents and Multiagent Systems"},{"key":"B33","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1109\/TCDS.2020.2968056","article-title":"A framework of hybrid force\/motion skills learning for robots","volume":"13","author":"Wang","year":"","journal-title":"IEEE Trans. Cogn. Dev. Syst"},{"key":"B34","doi-asserted-by":"publisher","first-page":"60175","DOI":"10.1109\/ACCESS.2021.3073711","article-title":"Hybrid trajectory and force learning of complex assembly tasks: a combined learning framework","volume":"9","author":"Wang","year":"","journal-title":"IEEE Access"},{"key":"B35","doi-asserted-by":"publisher","first-page":"777363","DOI":"10.3389\/frobt.2021.777363","article-title":"An adaptive imitation learning framework for robotic complex contact-rich insertion tasks","volume":"8","author":"Wang","year":"","journal-title":"Front. Robot. AI"},{"key":"B36","doi-asserted-by":"publisher","first-page":"1614","DOI":"10.1109\/TCYB.2022.3228578","article-title":"Expert system-based multiagent deep deterministic policy gradient for swarm robot decision making","volume":"54","author":"Wang","year":"","journal-title":"IEEE Trans. Cyber"},{"key":"B37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.robot.2016.06.003","article-title":"Neural inverse reinforcement learning in autonomous navigation","volume":"84","author":"Xia","year":"2016","journal-title":"Robot. Auton. Syst"},{"key":"B38","doi-asserted-by":"publisher","first-page":"1817","DOI":"10.3233\/JIFS-211999","article-title":"Path planning algorithm in complex environment based on DDPG and MPC","volume":"45","author":"Xue","year":"2023","journal-title":"J. Intell. Fuzzy Syst"},{"key":"B39","doi-asserted-by":"publisher","first-page":"1320251","DOI":"10.3389\/fnbot.2023.1320251","article-title":"Peg-in-hole assembly skill imitation learning method based on promps under task geometric representation","volume":"17","author":"Zang","year":"2023","journal-title":"Front. Neurorob"},{"key":"B40","doi-asserted-by":"publisher","DOI":"10.1109\/TIE.2023.3299051","article-title":"\u201cA learning-based two-stage method for submillimeter insertion tasks with only visual inputs,\u201d","author":"Zhao","year":"2023","journal-title":"IEEE Transactions on Industrial Electronics"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2024.1368243\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T04:33:53Z","timestamp":1710477233000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2024.1368243\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,15]]},"references-count":40,"alternative-id":["10.3389\/fnbot.2024.1368243"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2024.1368243","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,15]]},"article-number":"1368243"}}