{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,23]],"date-time":"2025-10-23T05:47:06Z","timestamp":1761198426385,"version":"3.44.0"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"23","license":[{"start":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T00:00:00Z","timestamp":1733875200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T00:00:00Z","timestamp":1733875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["EXC 2075 - 390740016"],"award-info":[{"award-number":["EXC 2075 - 390740016"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["214434"],"award-info":[{"award-number":["214434"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"publisher","award":["801708","801708"],"award-info":[{"award-number":["801708","801708"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009534","name":"Universit\u00e4t Stuttgart","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100009534","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>While deep reinforcement learning (RL) agents outperform humans on an increasing number of tasks, training them requires data equivalent to decades of human gameplay. Recent hierarchical RL methods have increased sample efficiency by incorporating information inherent to the structure of the decision problem but at the cost of having to discover or use human-annotated sub-goals that guide the learning process. We show that intentions of human players, i.e. the precursor of goal-oriented decisions, can be robustly predicted from eye gaze even for the long-horizon sparse rewards task of Montezuma\u2019s Revenge\u2013one of the most challenging RL tasks in the Atari2600 game suite. We propose <jats:italic>Int-HRL<\/jats:italic>: Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze. Our novel sub-goal extraction pipeline is fully automatic and replaces the need for manual sub-goal annotation by human experts. Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to an HRL agent that is significantly more sample efficient than previous methods.<\/jats:p>","DOI":"10.1007\/s00521-024-10596-2","type":"journal-article","created":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T15:33:30Z","timestamp":1733931210000},"page":"18823-18834","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Int-HRL: towards intention-based hierarchical reinforcement learning"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-8570-0007","authenticated-orcid":false,"given":"Anna","family":"Penzkofer","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Simon","family":"Schaefer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Florian","family":"Strohm","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mihai","family":"B\u00e2ce","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Leutenegger","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreas","family":"Bulling","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,12,11]]},"reference":[{"key":"10596_CR1","doi-asserted-by":"publisher","unstructured":"Kalashnkov D et al. (2021). Mt-opt: Continuous multi-task robotic reinforcement learning at scale https:\/\/doi.org\/10.48550\/arXiv.2104.08212","DOI":"10.48550\/arXiv.2104.08212"},{"key":"10596_CR2","unstructured":"Fan J (2022) A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions . arXiv: org\/abs\/2112.04145"},{"key":"10596_CR3","unstructured":"Badia A P et al. (2020) Agent57: Outperforming the Atari Human Benchmark . arXiv:org\/abs\/2003.13350"},{"key":"10596_CR4","unstructured":"Revaud J et al. (2019) R2D2: Repeatable and Reliable Detector and Descriptor . arXiv:org\/abs\/1906.06195"},{"key":"10596_CR5","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals O et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575:350\u2013354. https:\/\/doi.org\/10.1038\/s41586-019-1724-z","journal-title":"Nature"},{"key":"10596_CR6","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","volume":"47","author":"MG Bellemare","year":"2013","unstructured":"Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253\u2013279","journal-title":"J Artif Intell Res"},{"key":"10596_CR7","doi-asserted-by":"crossref","unstructured":"Zhang R et\u00a0al (2020) Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset. Proceedings of the AAAI Conference on Artificial Intelligence 34:6811\u20136820 (https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/6161)","DOI":"10.1609\/aaai.v34i04.6161"},{"key":"10596_CR8","doi-asserted-by":"publisher","unstructured":"Mnih, V et al. (2013) Playing atari with deep reinforcement learning . https:\/\/doi.org\/10.48550\/arXiv.1312.5602","DOI":"10.48550\/arXiv.1312.5602"},{"key":"10596_CR9","unstructured":"Badia, AP et al. (2020) Never Give Up: Learning Directed Exploration Strategies . arXiv: org\/abs\/2002.06038"},{"key":"10596_CR10","unstructured":"Vezhnevets AS et al. (2017) FeUdal Networks for Hierarchical Reinforcement Learning . arXiv: org\/abs\/1703.01161"},{"key":"10596_CR11","unstructured":"Kulkarni TD, Narasimhan KR, Saeedi A & Tenenbaum JB (2016). Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation . arXiv: org\/abs\/1604.06057"},{"key":"10596_CR12","unstructured":"Le HM et al.(2018). Hierarchical Imitation and Reinforcement Learning arXiv: org\/abs\/1803.00590"},{"key":"10596_CR13","doi-asserted-by":"publisher","first-page":"1049","DOI":"10.3389\/fpsyg.2015.01049","volume":"6","author":"CM Huang","year":"2015","unstructured":"Huang CM, Andrist S, Saupp\u00e9 A, Mutlu B (2015) Using gaze patterns to predict task intent in collaboration. Front Psychol 6:1049","journal-title":"Front Psychol"},{"key":"10596_CR14","doi-asserted-by":"publisher","first-page":"103275","DOI":"10.1016\/j.artint.2020.103275","volume":"284","author":"R Singh","year":"2020","unstructured":"Singh R et al (2020) Combining gaze and AI planning for online human intention recognition. Artif Intell 284:103275","journal-title":"Artif Intell"},{"key":"10596_CR15","unstructured":"David-John B et al (2021) Towards gaze-based prediction of the intent to interact in virtual reality, ETRA \u201921 Short Papers (Association for Computing Machinery, New York. NY. USA. doi 10(1145\/3448018):3458008"},{"key":"10596_CR16","doi-asserted-by":"publisher","first-page":"1647","DOI":"10.3390\/electronics11101647","volume":"11","author":"X-L Chen","year":"2022","unstructured":"Chen X-L, Hou W-J (2022) Gaze-based interaction intention recognition in virtual reality. Electronics 11:1647","journal-title":"Electronics"},{"key":"10596_CR17","unstructured":"Belardinelli, A. (2023) Gaze-based intention estimation: principles, methodologies, and applications in HRI . arXiv:org\/abs\/2302.04530"},{"key":"10596_CR18","doi-asserted-by":"publisher","first-page":"20130480","DOI":"10.1098\/rstb.2013.0480","volume":"369","author":"M Botvinick","year":"2014","unstructured":"Botvinick M, Weinstein A (2014) Model-based hierarchical reinforcement learning and human action control. Philos Trans Royal Soc B: Biol Sci 369:20130480","journal-title":"Philos Trans Royal Soc B: Biol Sci"},{"key":"10596_CR19","doi-asserted-by":"crossref","unstructured":"Zhang R. et al. (2018) AGIL: Learning Attention from Human for Visuomotor Tasks . arXiv: org\/abs\/1806.03960","DOI":"10.1007\/978-3-030-01252-6_41"},{"key":"10596_CR20","unstructured":"Bellemare MG et al. (2016)Unifying Count-Based Exploration and Intrinsic Motivation (2016). arXiv: org\/abs\/1606.01868"},{"key":"10596_CR21","doi-asserted-by":"publisher","unstructured":"Salimans T, & Chen R (2018). Learning montezuma\u2019s revenge from a single demonstration.  Conference on Neural Information Processing Systems, Deep RL Workshop . https:\/\/doi.org\/10.48550\/arXiv.1812.03381","DOI":"10.48550\/arXiv.1812.03381"},{"key":"10596_CR22","doi-asserted-by":"publisher","unstructured":"Silva F, Warnell G, Costa A & Stone P (2020). Agents teaching agents: a survey on inter-agent transfer learning.  Auton Agent Multi-Agent Syst 34 . https:\/\/doi.org\/10.1007\/s10458-019-09430-0","DOI":"10.1007\/s10458-019-09430-0"},{"key":"10596_CR23","doi-asserted-by":"publisher","first-page":"3621","DOI":"10.1007\/s12652-021-03489-y","volume":"14","author":"A Bignold","year":"2021","unstructured":"Bignold A et al (2021) A conceptual framework for externally-influenced agents: an assisted reinforcement learning review. J Ambient Intell Humaniz Comput 14:3621. https:\/\/doi.org\/10.1007\/s12652-021-03489-y","journal-title":"J Ambient Intell Humaniz Comput"},{"key":"10596_CR24","volume-title":"Intra-Option Learning about Temporally Abstract Actions","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Precup D, Singh SP (1998) Intra-Option Learning about Temporally Abstract Actions. Morgan Kaufmann Publishers Inc., San Francisco"},{"key":"10596_CR25","unstructured":"Dayan P & Hinton GE (1992)  Feudal Reinforcement Learning, Vol. 5 (Morgan-Kaufmann, ). https:\/\/papers.nips.cc\/paper\/1992\/hash\/d14220ee66aeec73c49038385428ec4c-Abstract.html"},{"key":"10596_CR26","unstructured":"Ross S, Gordon G & Bagnell D (2011).  A Reduction of imitation learning and structured prediction to no-regret online learning, 627\u2013635 (JMLR Workshop and Conference Proceedings, 2011). https:\/\/proceedings.mlr.press\/v15\/ross11a.html"},{"key":"10596_CR27","doi-asserted-by":"publisher","unstructured":"Veeriah V et al.(2021). Discovery of options via meta-learned subgoals.  Conference on Neural Information Processing Systems . https:\/\/doi.org\/10.48550\/arXiv.2102.06741","DOI":"10.48550\/arXiv.2102.06741"},{"key":"10596_CR28","unstructured":"Nica A, Khetarpal K & Precup D (2022) The paradox of choice: using attention in hierarchical reinforcement learning . arXiv: org\/abs\/2201.09653"},{"key":"10596_CR29","volume-title":"The Child\u2019s Discovery of the Mind: The Developing Child","author":"JW Astington","year":"1994","unstructured":"Astington JW (1994) The Child\u2019s Discovery of the Mind: The Developing Child. Harvard University Press, Cambridge"},{"key":"10596_CR30","unstructured":"Burda Y, Edwards H, Storkey A & Klimov O (2018) Exploration by random network distillation . arXiv: org\/abs\/1810.12894"},{"key":"10596_CR31","unstructured":"Anand A et al. (2020) Unsupervised State representation learning in Atari . arXiv: org\/abs\/1906.08226"},{"key":"10596_CR32","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1287\/isre.2019.0907","volume":"31","author":"J Pfeiffer","year":"2020","unstructured":"Pfeiffer J, Pfeiffer T, Mei\u00dfner M, Wei\u00df E (2020) Eye-Tracking-based classification of information search behavior using machine learning: evidence from experiments in physical shops and virtual reality shopping environments. Inf Syst Res 31:675\u2013691","journal-title":"Inf Syst Res"},{"key":"10596_CR33","unstructured":"Neubeck A, Van Gool L (2006) Efficient non-maximum suppression 3: pp 850\u2013855 (https:\/\/ieeexplore.ieee.org\/document\/1699659) (https:\/\/ieeexplore.ieee.org\/document\/1699659)"},{"key":"10596_CR34","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/j.neunet.2020.05.004","volume":"129","author":"A Kroner","year":"2020","unstructured":"Kroner A, Senden M, Driessens K, Goebel R (2020) Contextual encoder-decoder network for visual saliency prediction. Neural Netw 129:261\u2013270","journal-title":"Neural Netw"},{"key":"10596_CR35","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1109\/TVCG.2017.2743939","volume":"24","author":"LE Matzen","year":"2018","unstructured":"Matzen LE, Haass MJ, Divis KM, Wang Z, Wilson AT (2018) Data visualization saliency model: a tool for evaluating abstract data visualizations. IEEE transactions on visualization and computer graphics 24:563\u2013573","journal-title":"IEEE transactions on visualization and computer graphics"},{"key":"10596_CR36","unstructured":"Borji A & Itti L (2015) CAT2000: A large scale fixation dataset for boosting saliency research . arXiv: org\/abs\/1505.03581"},{"key":"10596_CR37","doi-asserted-by":"crossref","unstructured":"van Hasselt H, Guez A & Silver D (2015) Deep Reinforcement Learning with Double Q-learning . arXiv: org\/abs\/1509.06461","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"10596_CR38","unstructured":"Schaul T, Quan J, Antonoglou I & Silver D (2016) Prioritized Experience Replay . arXiv: org\/abs\/1511.05952"},{"key":"10596_CR39","unstructured":"Henderson P et al. (2019) Deep reinforcement learning that matters . arXiv: org\/abs\/1709.06560"},{"key":"10596_CR40","unstructured":"Wang Z et al. (2016) Dueling network architectures for deep reinforcement learning . arXiv: org\/abs\/1511.06581"},{"key":"10596_CR41","unstructured":"Ren S, He K, Girshick R & Sun J (2016) Faster R-CNN: Towards real-time object detection with region proposal networks . arXiv: org\/abs\/1506.01497"},{"key":"10596_CR42","doi-asserted-by":"publisher","first-page":"3241","DOI":"10.1016\/S0042-6989(99)00014-0","volume":"39","author":"MW von Gr\u00fcnau","year":"1999","unstructured":"von Gr\u00fcnau MW, Faubert J, Iordanova M, Rajska D (1999) Flicker and the efficiency of cues for capturing attention. Vision Res 39:3241\u20133252","journal-title":"Vision Res"},{"key":"10596_CR43","doi-asserted-by":"crossref","unstructured":"Kempka M, Wydmuch M, Runc G, Toczek J & Ja\u015bkowski W (2016) ViZDoom: A Doom-based AI research platform for visual reinforcement learning . arXiv: org\/abs\/1605.02097","DOI":"10.1109\/CIG.2016.7860433"},{"key":"10596_CR44","unstructured":"Ruhdorfer C, Bortoletto M, Penzkofer A & Bulling A The Overcooked generalisation challenge . arXiv: org\/abs\/2406.17949"},{"key":"10596_CR45","unstructured":"Morad S, Kortvelesy R, Bettini M, Liwicki S & Prorok A (2022)  POPGym: Benchmarking Partially Observable Reinforcement Learning . https:\/\/openreview.net\/forum?id=chDrutUTs0K"},{"key":"10596_CR46","unstructured":"Nikulin A. et al. (2024) XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX . arXiv: org\/abs\/2312.12044"},{"key":"10596_CR47","unstructured":"Mannering W, Ford N, Harsono JJ & Winder J (2024) Generative artificial intelligence for behavioral intent prediction.  Proceedings of the Annual Meeting of the Cognitive Science Society  46 . https:\/\/escholarship.org\/uc\/item\/0gs9c90f"},{"key":"10596_CR48","unstructured":"Chevalier-Boisvert M et al. (2023) Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks"},{"key":"10596_CR49","doi-asserted-by":"publisher","unstructured":"Saran A Zhang R Short ES Niekum S (2021). Efficiently guiding imitation learning agents with human gaze. AAMAS 21:1109\u20131117. https:\/\/doi.org\/10.48550\/arXiv.2002.12500","DOI":"10.48550\/arXiv.2002.12500"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10596-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-10596-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10596-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T01:53:12Z","timestamp":1757123592000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-10596-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,11]]},"references-count":49,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10596"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-10596-2","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"type":"print","value":"0941-0643"},{"type":"electronic","value":"1433-3058"}],"subject":[],"published":{"date-parts":[[2024,12,11]]},"assertion":[{"value":"11 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 October 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 December 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}