{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T05:40:44Z","timestamp":1777182044602,"version":"3.51.4"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2024,5,28]],"date-time":"2024-05-28T00:00:00Z","timestamp":1716854400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,28]],"date-time":"2024-05-28T00:00:00Z","timestamp":1716854400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62373375"],"award-info":[{"award-number":["62373375"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reinforcement learning (RL) has achieved remarkable advancements in navigation tasks in recent years. However, tackling multi-goal navigation tasks with sparse rewards remains a complex and challenging problem due to the long-sequence decision-making involved. Such multi-goal navigation tasks inherently incorporate a hybrid action space, where the robot needs to select a navigation endpoint first before executing primitive actions. To address the problem of multi-goal navigation with sparse rewards, we introduce a novel hierarchical RL framework named Hierarchical RL with Multi-Goal (HRL-MG). The main idea of HRL-MG is to divide and conquer the hybrid action space, splitting long-sequence decisions into short-sequence decisions. The HRL-MG framework is composed of two main modules: a selector and an actuator. The selector employs a temporal abstraction hierarchical architecture designed to specify a desired end goal based on the discrete action space. Conversely, the actuator utilizes a continuous goal-oriented hierarchical architecture developed to enact continuous action sequences to reach the desired end goal specified by the selector. In addition, we incorporate a dynamic goal detection mechanism, grounded in hindsight experience replay, to mitigate the challenges posed by sparse reward landscapes. We validated the algorithm\u2019s efficacy on both the discrete environment Maze_2D and the continuous robotic environment MuJoCo \u2018Ant\u2019. The results indicate that HRL-MG significantly outperforms other methods in multi-goal navigation tasks with sparse rewards.<\/jats:p>","DOI":"10.1007\/s10462-024-10794-3","type":"journal-article","created":{"date-parts":[[2024,5,28]],"date-time":"2024-05-28T07:01:58Z","timestamp":1716879718000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation"],"prefix":"10.1007","volume":"57","author":[{"given":"Jiangyue","family":"Yan","sequence":"first","affiliation":[]},{"given":"Biao","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Xiaodong","family":"Xu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,28]]},"reference":[{"key":"10794_CR1","doi-asserted-by":"crossref","unstructured":"Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st international conference on machine learning, p1","DOI":"10.1145\/1015330.1015430"},{"key":"10794_CR2","unstructured":"Andrychowicz M, Wolski F, Ray A etal (2017) Hindsight experience replay. In: Proceedings of the 31st international conference on neural information processing systems, pp 5055\u20135065"},{"key":"10794_CR3","doi-asserted-by":"crossref","unstructured":"Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the association advancement artificial intelligence, pp 1726\u20131734","DOI":"10.1609\/aaai.v31i1.10916"},{"issue":"1\u20132","key":"10794_CR4","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1022140919877","volume":"13","author":"AG Barto","year":"2003","unstructured":"Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst 13(1\u20132):41\u201377","journal-title":"Discret Event Dyn Syst"},{"key":"10794_CR5","unstructured":"Bellemare M, Srinivasan S, Ostrovski G et al (2016) Unifying count-based exploration and intrinsic motivation. In: Advances in neural information processing systems 29 (NIPS 2016), pp. 1471\u20131479"},{"issue":"6","key":"10794_CR6","doi-asserted-by":"publisher","first-page":"956","DOI":"10.1016\/j.conb.2012.05.008","volume":"22","author":"MM Botvinick","year":"2012","unstructured":"Botvinick MM (2012) Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol 22(6):956\u2013962","journal-title":"Curr Opin Neurobiol"},{"key":"10794_CR7","unstructured":"Burda Y, Edwards H, Pathak D etal (2018) Large-scale study of curiosity-driven learning. arXiv preprint. arXiv:1808.04355"},{"issue":"1\u20132","key":"10794_CR8","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/S0004-3702(99)00070-3","volume":"114","author":"W Burgard","year":"1999","unstructured":"Burgard W, Cremers AB, Fox D et al (1999) Experiences with an interactive museum tour-guide robot. Artif Intell 114(1\u20132):3\u201355","journal-title":"Artif Intell"},{"key":"10794_CR9","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1613\/jair.639","volume":"13","author":"TG Dietterich","year":"2000","unstructured":"Dietterich TG (2000) Hierarchical reinforcement learning with the maxq value function decomposition. J Artif Intell Res 13:227\u2013303","journal-title":"J Artif Intell Res"},{"key":"10794_CR10","doi-asserted-by":"crossref","unstructured":"Dijkstra EW (2022) A note on two problems in connexion with graphs. In: Edsger Wybe Dijkstra: His Life, Work, and Legacy, pp 287-290","DOI":"10.1145\/3544585.3544600"},{"key":"10794_CR11","doi-asserted-by":"crossref","unstructured":"Fan Z, Su R, Zhang W etal (2019) Hybrid actor-critic reinforcement learning in parameterized action space. arXiv preprint. arXiv:1903.01344","DOI":"10.24963\/ijcai.2019\/316"},{"key":"10794_CR12","unstructured":"Fang M, Zhou T, Du Y etal (2019) Curriculum-guided hindsight experience replay. In: Advances in neural information processing systems, vol 32"},{"key":"10794_CR13","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1016\/j.artint.2014.11.009","volume":"247","author":"A Faust","year":"2017","unstructured":"Faust A, Palunko I, Cruz P et al (2017) Automated aerial suspended cargo delivery through reinforcement learning. Artif Intell 247:381\u2013398","journal-title":"Artif Intell"},{"key":"10794_CR14","first-page":"257","volume":"39","author":"EA Feinberg","year":"1994","unstructured":"Feinberg EA (1994) Constrained semi-markov decision processes with average rewards. Z Oper Res 39:257\u2013288","journal-title":"Z Oper Res"},{"key":"10794_CR16","unstructured":"Fu J, Luo K, Levine S (2017) Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint. arXiv:1710.11248"},{"key":"10794_CR15","doi-asserted-by":"crossref","unstructured":"Fu H, Tang H, Hao J etal (2019) Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. arXiv preprint. arXiv:1903.04959","DOI":"10.24963\/ijcai.2019\/323"},{"issue":"8","key":"10794_CR17","doi-asserted-by":"publisher","first-page":"3796","DOI":"10.1109\/TNNLS.2021.3124466","volume":"34","author":"LC Garaffa","year":"2021","unstructured":"Garaffa LC, Basso M, Konzen AA et al (2021) Reinforcement learning for mobile robotics exploration: a survey. IEEE Trans Neural Netw Learn Syst 34(8):3796\u20133810","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"2","key":"10794_CR18","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","volume":"4","author":"PE Hart","year":"1968","unstructured":"Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern 4(2):100\u2013107","journal-title":"IEEE Trans Syst Sci Cybern"},{"key":"10794_CR19","unstructured":"Hausknecht M, Stone P (2015) Deep reinforcement learning in parameterized action space. arXiv preprint. arXiv:1511.04143"},{"issue":"2","key":"10794_CR20","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1109\/TLA.2017.7854617","volume":"15","author":"K Hernandez","year":"2017","unstructured":"Hernandez K, Bacca B, Posso B (2017) Multi-goal path planning autonomous system for picking up and delivery tasks in mobile robotics. IEEE Lat Am Trans 15(2):232\u2013238","journal-title":"IEEE Lat Am Trans"},{"key":"10794_CR21","unstructured":"Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems (NeurIPS), vol 29, pp 4565\u20134573"},{"issue":"11","key":"10794_CR22","doi-asserted-by":"publisher","first-page":"1289","DOI":"10.1177\/0278364915619772","volume":"35","author":"H Kretzschmar","year":"2016","unstructured":"Kretzschmar H, Spies M, Sprunk C et al (2016) Socially compliant mobile robot navigation via inverse reinforcement learning. Int J Robot Res 35(11):1289\u20131307","journal-title":"Int J Robot Res"},{"key":"10794_CR23","unstructured":"Kulkarni TD, Narasimhan K, Saeedi A etal (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, vol 29"},{"key":"10794_CR24","unstructured":"LaValle S (1998) Rapidly-exploring random trees: a new tool for path planning. Research Report 9811"},{"key":"10794_CR25","unstructured":"Levy A, Konidaris G, Platt R etal (2017) Learning multi-level hierarchies with hindsight. arXiv preprint. arXiv:1712.00948"},{"key":"10794_CR26","unstructured":"Li B, Tang H, Zheng Y etal (2021) Hyar: Addressing discrete-continuous action reinforcement learning via hybrid action representation. arXiv preprint. arXiv:2109.05490"},{"key":"10794_CR27","unstructured":"Li C, Xia F, Martin-Martin R etal (2020) Hrl4in: Hierarchical reinforcement learning for interactive navigation with mobile manipulators. In: Conference on robot learning, PMLR, pp 603\u2013616"},{"key":"10794_CR28","doi-asserted-by":"crossref","unstructured":"Liu M, Zhu M, Zhang W (2022) Goal-conditioned reinforcement learning: problems and solutions. arXiv preprint. arXiv:2201.08299","DOI":"10.24963\/ijcai.2022\/770"},{"key":"10794_CR29","unstructured":"Nachum O, Gu SS, Lee H etal (2018) Data-efficient hierarchical reinforcement learning. In: Advances in neural information processing systems, vol 31"},{"key":"10794_CR30","unstructured":"Ng AY, Russell S etal (2000) Algorithms for inverse reinforcement learning. In: Proceedings of 17th international conference on machine learning, pp 663\u2013670"},{"key":"10794_CR31","unstructured":"Parr R, Russell S (1997) Reinforcement learning with hierarchies of machines. In: Advances in neural information processing systems, vol 10"},{"issue":"5","key":"10794_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3453160","volume":"54","author":"S Pateria","year":"2021","unstructured":"Pateria S, Subagdja B, Tan Ah et al (2021) Hierarchical reinforcement learning: a comprehensive survey. ACM Comput Surv (CSUR) 54(5):1\u201335","journal-title":"ACM Comput Surv (CSUR)"},{"key":"10794_CR33","doi-asserted-by":"crossref","unstructured":"Pathak D, Agrawal P, Efros AA etal (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning, PMLR, pp 2778\u20132787","DOI":"10.1109\/CVPRW.2017.70"},{"key":"10794_CR34","unstructured":"Plappert M, Andrychowicz M, Ray A etal (2018) Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv preprint. arXiv:1802.09464"},{"key":"10794_CR35","doi-asserted-by":"crossref","unstructured":"Reizinger P, Szemenyei M (2020) Attention-based curiosity-driven exploration in deep reinforcement learning. In: International conference on acoustics, speech and signal processing. IEEE, pp 3542\u20133546","DOI":"10.1109\/ICASSP40776.2020.9054546"},{"key":"10794_CR36","unstructured":"Schaul T, Horgan D, Gregor K etal (2015) Universal value function approximators. In: International conference on machine learning, PMLR, pp 1312\u20131320"},{"key":"10794_CR37","unstructured":"Schulman J, Moritz P, Levine S etal (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint. arXiv:1506.02438"},{"key":"10794_CR38","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1016\/j.promfg.2018.02.058","volume":"20","author":"R Shah","year":"2018","unstructured":"Shah R, Pandey A (2018) Concept for automated sorting robotic arm. Procedia Manuf 20:400\u2013405","journal-title":"Procedia Manuf"},{"key":"10794_CR39","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT, Cambridge"},{"issue":"1\u20132","key":"10794_CR40","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton RS, Precup D, Singh S (1999) Between MDPS and semi-MDPS: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1\u20132):181\u2013211","journal-title":"Artif Intell"},{"key":"10794_CR41","unstructured":"Tang H, Houthooft R, Foote D etal (2017) # exploration: a study of count-based exploration for deep reinforcement learning. In: Proceedings of the 31st international conference on neural information processing systems, pp 2750\u20132759"},{"key":"10794_CR42","doi-asserted-by":"crossref","unstructured":"Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: International conference on intelligent robots and systems. IEEE, pp 5026\u20135033","DOI":"10.1109\/IROS.2012.6386109"},{"key":"10794_CR43","unstructured":"Trott A, Zheng S, Xiong C etal (2019) Keeping your distance: solving sparse reward tasks using self-balancing shaped rewards. In: Advances in neural information processing systems, vol 32"},{"key":"10794_CR44","unstructured":"Vezhnevets AS, Osindero S, Schaul T etal (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, PMLR, pp 3540\u20133549"},{"issue":"7","key":"10794_CR45","doi-asserted-by":"publisher","first-page":"6180","DOI":"10.1109\/JIOT.2020.2973193","volume":"7","author":"C Wang","year":"2020","unstructured":"Wang C, Wang J, Wang J et al (2020) Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards. IEEE Internet Things J 7(7):6180\u20136190","journal-title":"IEEE Internet Things J"},{"key":"10794_CR46","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJ Watkins","year":"1992","unstructured":"Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8:279\u2013292","journal-title":"Mach Learn"},{"key":"10794_CR47","unstructured":"Xiong J, Wang Q, Yang Z etal (2018) Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space. arXiv preprint. arXiv:1810.06394"},{"key":"10794_CR48","unstructured":"Zhelo O, Zhang J, Tai L etal (2018) Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv preprint. arXiv:1804.00456"},{"key":"10794_CR49","doi-asserted-by":"crossref","unstructured":"Zhu Y, Mottaghi R, Kolve E etal (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: International conference on robotics and automation. IEEE, pp 3357\u20133364","DOI":"10.1109\/ICRA.2017.7989381"},{"key":"10794_CR50","first-page":"1433","volume-title":"Maximum entropy inverse reinforcement learning","author":"BD Ziebart","year":"2008","unstructured":"Ziebart BD, Maas AL, Bagnell JA et al (2008) Maximum entropy inverse reinforcement learning. Aaai, Chicago, pp 1433\u20131438"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10794-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-024-10794-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10794-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T05:05:35Z","timestamp":1732079135000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-024-10794-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,28]]},"references-count":50,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["10794"],"URL":"https:\/\/doi.org\/10.1007\/s10462-024-10794-3","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,28]]},"assertion":[{"value":"6 May 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 May 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"156"}}