{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T21:35:22Z","timestamp":1771104922314,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T00:00:00Z","timestamp":1716768000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T00:00:00Z","timestamp":1716768000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Hierarchical reinforcement learning (HRL) has achieved remarkable success and significant progress in complex and long-term decision-making problems. However, HRL training typically entails substantial computational costs and an enormous number of samples. One effective approach to tackle this challenge is hierarchical reinforcement learning from demonstrations (HRLfD), which leverages demonstrations to expedite the training process of HRL. The effectiveness of HRLfD is contingent upon the quality of the demonstrations; hence, suboptimal demonstrations may impede efficient learning. To address this issue, this paper proposes a reachability-based reward shaping (RbRS) method to alleviate the negative interference of suboptimal demonstrations for the HRL agent. The novel HRLfD algorithm based on RbRS is named HRLfD-RbRS, which incorporates the RbRS method to enhance the learning efficiency of HRLfD. Moreover, with the help of this method, the learning agent can explore better policies under the guidance of the suboptimal demonstration. We evaluate the proposed HRLfD-RbRS algorithm on various complex robotic tasks, and the experimental results demonstrate that our method outperforms current state-of-the-art HRLfD algorithms.<\/jats:p>","DOI":"10.1007\/s11063-024-11632-x","type":"journal-article","created":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T21:01:20Z","timestamp":1716843680000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Hierarchical Reinforcement Learning from Demonstration via Reachability-Based Reward Shaping"],"prefix":"10.1007","volume":"56","author":[{"given":"Xiaozhu","family":"Gao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinhui","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Wan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lingling","family":"An","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,5,27]]},"reference":[{"key":"11632_CR1","doi-asserted-by":"crossref","unstructured":"Mousavi SS, Schukat M, Howley E (2018) Deep reinforcement learning: an overview. In: Proceedings of SAI intelligent systems conference (IntelliSys) 2016: vol 2, pp 426\u2013440. Springer","DOI":"10.1007\/978-3-319-56991-8_32"},{"key":"11632_CR2","doi-asserted-by":"crossref","unstructured":"Fran\u00e7ois-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J (2018) An introduction to deep reinforcement learning","DOI":"10.1561\/9781680835397"},{"key":"11632_CR3","doi-asserted-by":"crossref","unstructured":"Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2018) Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on artificial intelligence, vol 32","DOI":"10.1609\/aaai.v32i1.11694"},{"issue":"4","key":"11632_CR4","doi-asserted-by":"publisher","first-page":"4152","DOI":"10.1109\/TPAMI.2022.3192418","volume":"45","author":"T Zhang","year":"2023","unstructured":"Zhang T, Guo S, Tan T, Hu X, Chen F (2023) Adjacency constraint for efficient hierarchical reinforcement learning. IEEE Trans Pattern Anal Mach Intell 45(4):4152\u20134166","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"5","key":"11632_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3453160","volume":"54","author":"S Pateria","year":"2021","unstructured":"Pateria S, Subagdja B, Tan A-H, Quek C (2021) Hierarchical reinforcement learning: a comprehensive survey. ACM Comput Surv (CSUR) 54(5):1\u201335","journal-title":"ACM Comput Surv (CSUR)"},{"issue":"1\u20132","key":"11632_CR6","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1022140919877","volume":"13","author":"AG Barto","year":"2003","unstructured":"Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1\u20132):41\u201377","journal-title":"Discrete Event Dyn Syst"},{"key":"11632_CR7","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1016\/j.cogsys.2020.08.012","volume":"65","author":"A Skrynnik","year":"2021","unstructured":"Skrynnik A, Staroverov A, Aitygulov E, Aksenov K, Davydov V, Panov AI (2021) Hierarchical deep Q-network from imperfect demonstrations in Minecraft. Cogn Syst Res 65:74\u201378","journal-title":"Cogn Syst Res"},{"key":"11632_CR8","unstructured":"Le H, Jiang N, Agarwal A, Dud\u00edk M, Yue Y, Daum\u00e9 III H (2018) Hierarchical imitation and reinforcement learning, 2917\u20132926 . PMLR"},{"issue":"9","key":"11632_CR9","doi-asserted-by":"publisher","first-page":"5572","DOI":"10.1109\/TPAMI.2021.3069005","volume":"44","author":"S Guo","year":"2022","unstructured":"Guo S, Yan Q, Su X, Hu X, Chen F (2022) State-temporal compression in reinforcement learning with the reward-restricted geodesic metric. IEEE Trans Pattern Anal Mach Intell 44(9):5572\u20135589","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11632_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.106844","volume":"218","author":"A Skrynnik","year":"2021","unstructured":"Skrynnik A, Staroverov A, Aitygulov E, Aksenov K, Davydov V, Panov AI (2021) Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl-Based Syst 218:106844","journal-title":"Knowl-Based Syst"},{"key":"11632_CR11","unstructured":"Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29"},{"key":"11632_CR12","unstructured":"Nachum O, Gu SS, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems31"},{"issue":"4","key":"11632_CR13","doi-asserted-by":"publisher","first-page":"1278","DOI":"10.3390\/s21041278","volume":"21","author":"J Hua","year":"2021","unstructured":"Hua J, Zeng L, Li G, Ju Z (2021) Learning for a robot: deep reinforcement learning, imitation learning, transfer learning. Sensors 21(4):1278","journal-title":"Sensors"},{"key":"11632_CR14","unstructured":"Gupta A, Kumar V, Lynch C, Levine S, Hausman K (2020) Relay policy learning: solving long-horizon tasks via imitation and reinforcement learning. In: Conference on robot learning, pp 1025\u20131037. PMLR"},{"issue":"136","key":"11632_CR15","first-page":"1","volume":"18","author":"C Wirth","year":"2017","unstructured":"Wirth C, Akrour R, Neumann G, F\u00fcrnkranz J et al (2017) A survey of preference-based reinforcement learning methods. J Mach Learn Res 18(136):1\u201346","journal-title":"J Mach Learn Res"},{"key":"11632_CR16","unstructured":"Lee K, Smith L, Dragan A, Abbeel P (2021) B-pref: Benchmarking preference-based reinforcement learning. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 1)"},{"key":"11632_CR17","doi-asserted-by":"crossref","unstructured":"Wirth C, F\u00fcrnkranz J, Neumann G (2016) Model-free preference-based reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30","DOI":"10.1609\/aaai.v30i1.10269"},{"issue":"10","key":"11632_CR18","doi-asserted-by":"publisher","first-page":"4770","DOI":"10.1109\/TCYB.2020.2999492","volume":"51","author":"C Wang","year":"2020","unstructured":"Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Trans Cybern 51(10):4770\u20134783","journal-title":"IEEE Trans Cybern"},{"key":"11632_CR19","unstructured":"Chen L, Paleja R, Gombolay M (2021) Learning from suboptimal demonstration via self-supervised reward regression. In: Conference on robot learning, pp 1262\u20131277. PMLR"},{"key":"11632_CR20","unstructured":"Shelhamer E, Mahmoudieh P, Argus M, Darrell T (2017) Loss is its own reward: self-supervision for reinforcement learning"},{"key":"11632_CR21","unstructured":"Kim C, Park J, Shin J, Lee H, Abbeel P, Lee K (2022) Preference transformer: modeling human preferences using transformers for RL. In: The eleventh international conference on learning representations"},{"key":"11632_CR22","doi-asserted-by":"crossref","unstructured":"Todorov E, Erez T, Tassa Y (2012) Mujoco: A physics engine for model-based control, 5026\u20135033. IEEE","DOI":"10.1109\/IROS.2012.6386109"},{"key":"11632_CR23","unstructured":"Florensa C, Duan Y, Abbeel P (2016) Stochastic neural networks for hierarchical reinforcement learning. In: International conference on learning representations"},{"key":"11632_CR24","unstructured":"Levy A, Konidaris G, Platt R, Saenko K (2018) Learning multi-level hierarchies with hindsight. In: International conference on learning representations"},{"key":"11632_CR25","unstructured":"Nachum O, Gu S, Lee H, Levine S (2018) Near-optimal representation learning for hierarchical reinforcement learning. In: International conference on learning representations"},{"key":"11632_CR26","unstructured":"Dayan P, Hinton GE (1992) Feudal reinforcement learning. Adv Neural Inf Process Syst 5"},{"key":"11632_CR27","doi-asserted-by":"crossref","unstructured":"Kim W, Lee C, Kim HJ (2018) Learning and generalization of dynamic movement primitives by hierarchical deep reinforcement learning from demonstration. In: 2018 IEEE\/RSJ international conference on intelligent robots and systems (IROS), pp 3117\u20133123 . IEEE","DOI":"10.1109\/IROS.2018.8594476"},{"key":"11632_CR28","unstructured":"Nachum O, Gu SS, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. Adv Neural Inf Process Syst 31"},{"key":"11632_CR29","unstructured":"Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, pp 3540\u20133549. PMLR"},{"issue":"4","key":"11632_CR30","doi-asserted-by":"publisher","first-page":"4661","DOI":"10.1007\/s11063-022-11058-3","volume":"55","author":"W Zhang","year":"2023","unstructured":"Zhang W, Ji M, Yu H, Zhen C (2023) Relp: reinforcement learning pruning method based on prior knowledge. Neural Process Lett 55(4):4661\u20134678","journal-title":"Neural Process Lett"},{"key":"11632_CR31","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1007\/s10489-018-1296-x","volume":"49","author":"X Zhao","year":"2019","unstructured":"Zhao X, Ding S, An Y, Jia W (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49:581\u2013591","journal-title":"Appl Intell"},{"issue":"5","key":"11632_CR32","doi-asserted-by":"publisher","first-page":"4341","DOI":"10.1007\/s11063-022-10811-y","volume":"54","author":"M Yi","year":"2022","unstructured":"Yi M, Yang P, Du M, Ma R (2022) DMADRL: A distributed multi-agent deep reinforcement learning algorithm for cognitive offloading in dynamic MEC networks. Neural Process Lett 54(5):4341\u20134373","journal-title":"Neural Process Lett"},{"key":"11632_CR33","doi-asserted-by":"publisher","first-page":"4889","DOI":"10.1007\/s10489-018-1241-z","volume":"48","author":"X Zhao","year":"2018","unstructured":"Zhao X, Ding S, An Y, Jia W (2018) Asynchronous reinforcement learning algorithms for solving discrete space path planning problems. Appl Intell 48:4889\u20134904","journal-title":"Appl Intell"},{"key":"11632_CR34","doi-asserted-by":"crossref","unstructured":"Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Horgan D, Quan J, Sendonaris A, Osband I, et al.: (2018) Deep q-learning from demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, vol 32","DOI":"10.1609\/aaai.v32i1.11757"},{"issue":"2\u20133","key":"11632_CR35","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1177\/0278364918784350","volume":"38","author":"S Krishnan","year":"2019","unstructured":"Krishnan S, Garg A, Liaw R, Thananjeyan B, Miller L, Pokorny FT, Goldberg K (2019) Swirl: a sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. Int J Robot Res 38(2\u20133):126\u2013145","journal-title":"Int J Robot Res"},{"issue":"6","key":"11632_CR36","first-page":"7","volume":"5","author":"R Fox","year":"2017","unstructured":"Fox R, Krishnan S, Stoica I, Goldberg K (2017) Multi-level discovery of deep options. Representations 5(6):7\u20138","journal-title":"Representations"},{"key":"11632_CR37","unstructured":"Brys T, Harutyunyan A, Suay HB, Chernova S, Taylor ME, Now\u00e9 A (2015) Reinforcement learning from demonstration through shaping. In: Twenty-fourth international joint conference on artificial intelligence"},{"key":"11632_CR38","doi-asserted-by":"crossref","unstructured":"Lin Z, Li J, Shi J, Ye D, Fu Q, Yang W (2022) Juewu-mc: playing minecraft with sample-efficient hierarchical reinforcement learning. In: IJCIA","DOI":"10.24963\/ijcai.2022\/452"},{"key":"11632_CR39","doi-asserted-by":"crossref","unstructured":"Abdelkareem Y, Shehata S, Karray F (2022) Advances in preference-based reinforcement learning: a review. In: 2022 IEEE international conference on systems, man, and cybernetics (SMC), pp 2527\u20132532. IEEE","DOI":"10.1109\/SMC53654.2022.9945333"},{"key":"11632_CR40","unstructured":"Pertsch K, Lee Y, Wu Y, Lim JJ (2021) Guided reinforcement learning with learned skills. In: 5th Annual conference on robot learning"},{"key":"11632_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-021-10085-1","volume":"55","author":"J Ram\u00edrez","year":"2022","unstructured":"Ram\u00edrez J, Yu W, Perrusqu\u00eda A (2022) Model-free reinforcement learning from expert demonstrations: a survey. Artif Intell Rev 55:1\u201329","journal-title":"Artif Intell Rev"},{"key":"11632_CR42","unstructured":"Taylor ME, Suay HB, Chernova S (2011) Integrating reinforcement learning with human demonstrations of varying ability. In: The 10th international conference on autonomous agents and multiagent systems, vol 2, pp 617\u2013624"},{"key":"11632_CR43","doi-asserted-by":"crossref","unstructured":"Nair A, McGrew B, Andrychowicz M, Zaremba W, Abbeel P (2018) Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 6292\u20136299. IEEE","DOI":"10.1109\/ICRA.2018.8463162"},{"issue":"2","key":"11632_CR44","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1109\/TITS.2020.3024655","volume":"23","author":"S Aradi","year":"2020","unstructured":"Aradi S (2020) Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans Intell Transp Syst 23(2):740\u2013759","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"11632_CR45","doi-asserted-by":"crossref","unstructured":"Wang VH, Pajarinen J, Wang T, K\u00e4m\u00e4r\u00e4inen J-K (2023) State-conditioned adversarial subgoal generation. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 10184\u201310191","DOI":"10.1609\/aaai.v37i8.26213"},{"key":"11632_CR46","first-page":"21579","volume":"33","author":"T Zhang","year":"2020","unstructured":"Zhang T, Guo S, Tan T, Hu X, Chen F (2020) Generating adjacency-constrained subgoals in hierarchical reinforcement learning. Adv Neural Inf Process Syst 33:21579\u201321590","journal-title":"Adv Neural Inf Process Syst"},{"issue":"9","key":"11632_CR47","doi-asserted-by":"publisher","first-page":"4727","DOI":"10.1109\/TNNLS.2021.3059912","volume":"33","author":"X Yang","year":"2021","unstructured":"Yang X, Ji Z, Wu J, Lai Y-K, Wei C, Liu G, Setchi R (2021) Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans Neural Netw Learn Syst 33(9):4727\u20134741","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"11632_CR48","doi-asserted-by":"crossref","unstructured":"Yang X, Ji Z, Wu J, Lai Y-K (2022) Abstract demonstrations and adaptive exploration for efficient and stable multi-step sparse reward reinforcement learning. In: 2022 27th international conference on automation and computing (ICAC), pp 1\u20136 . IEEE","DOI":"10.1109\/ICAC55051.2022.9911100"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11632-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11632-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11632-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T11:24:42Z","timestamp":1721042682000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11632-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,27]]},"references-count":48,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["11632"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11632-x","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,27]]},"assertion":[{"value":"29 April 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 May 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent Statement"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Institutional Review Board Statement"}}],"article-number":"184"}}