{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T22:59:59Z","timestamp":1778799599355,"version":"3.51.4"},"reference-count":53,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[1997,9,1]],"date-time":"1997-09-01T00:00:00Z","timestamp":873072000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[1997,9]]},"abstract":"<jats:p>HQ-learning is a hierarchical extension of Q(\u03bb)-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work.<\/jats:p>","DOI":"10.1177\/105971239700600202","type":"journal-article","created":{"date-parts":[[2007,3,11]],"date-time":"2007-03-11T01:45:25Z","timestamp":1173577525000},"page":"219-246","source":"Crossref","is-referenced-by-count":113,"title":["HQ-Learning"],"prefix":"10.1177","volume":"6","author":[{"given":"Marco","family":"Wiering","sequence":"first","affiliation":[{"name":"Istituto Dalle Molle di Studi sull'Intelligenza Artificiale"}]},{"given":"J\u00fcrgen","family":"Schmidhuber","sequence":"additional","affiliation":[{"name":"Istituto Dalle Molle di Studi sull'Intelligenza Artificiale"}]}],"member":"179","published-online":{"date-parts":[[1997,9,1]]},"reference":[{"key":"atypb1","volume-title":"Computing optimal policies for partially observable decision processes using compact representations","author":"Boutilier, C.","year":"1996"},{"key":"atypb2","volume-title":"Training Q-agents (Tech. Rep. No. IRIDIA-94-14)","author":"Caironi, P. V C.","year":"1994"},{"key":"atypb3","volume-title":"Proceedings of the Tenth International Conference on Artificial Intelligence","author":"Chrisman, L."},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1177\/105971239400300201"},{"key":"atypb5","doi-asserted-by":"publisher","DOI":"10.21236\/ADA290058"},{"key":"atypb6","volume-title":"Feudal reinforcement learning","author":"Dayan, P.","year":"1993"},{"key":"atypb7","volume-title":"From animals to animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior","author":"Digney, B."},{"key":"atypb8","volume-title":"Theory of optimal experiments","author":"Fedorov, V.V.","year":"1972"},{"key":"atypb9","volume-title":"Hierarchical recurrent neural networks for long-term dependencies","author":"Hihi, S.E.","year":"1996"},{"key":"atypb10","doi-asserted-by":"crossref","unstructured":"Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory Neural Computation, 9, 1681-1726.","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"atypb11","volume-title":"From animals to animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior","author":"Humphrys, M."},{"key":"atypb12","volume-title":"Reinforcement learning algorithm for partially observable Markov decision problems","author":"Jaakkola, T.","year":"1995"},{"key":"atypb13","volume-title":"Supervised learning with a distal teacher (Tech. Rep. Occ. Paper No. 40)","author":"Jordan, M.I.","year":"1990"},{"key":"atypb14","volume-title":"Planning and acting in partially observable stochastic domains (Unpublished Tech. rep.)","author":"Kaelbling, L.","year":"1995"},{"key":"atypb15","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114729"},{"issue":"3","key":"atypb16","first-page":"265","volume":"9","author":"Levin, L.A.","year":"1973","journal-title":"Problems of Information Transmission"},{"key":"atypb17","volume-title":"Reinforcement learning for robots using neural networks","author":"Lin, L.","year":"1993"},{"key":"atypb18","volume-title":"From animals to animats 3: Proceedings of the International Conference on Simulation of Adaptive Behavior","author":"Littman, M."},{"key":"atypb19","volume-title":"Algorithms for sequential decision making","author":"Littman, M.","year":"1996"},{"key":"atypb20","volume-title":"Machine learning: Proceedings of the Twelfth International Conference","author":"Littman, M."},{"key":"atypb21","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50031-9"},{"key":"atypb22","volume-title":"From animals to animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior","author":"McCallum, R.A."},{"key":"atypb23","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993104"},{"key":"atypb24","volume-title":"Proceedings of the International Joint Conference on Neural Networks","author":"Nguyen, D."},{"key":"atypb25","volume-title":"Approximating optimal policies for partially observable stochastic domains. In Proceedings of the International Joint Conference on Artificial Intelligence","author":"Parr, R.","year":"1995"},{"key":"atypb26","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114731"},{"key":"atypb27","volume-title":"Continual learning in reinforcement environments","author":"Ring, M.B.","year":"1994"},{"key":"atypb28","volume-title":"Proceedings of the Seventh Annual Conference on Computational Learning Theory","author":"Ron, D."},{"key":"atypb29","doi-asserted-by":"publisher","DOI":"10.1162\/evco.1997.5.2.123"},{"key":"atypb30","volume-title":"Proceedings of the International Joint Conference on Neural Networks","author":"Schmidhuber, J."},{"key":"atypb31","volume-title":"Learning to generate sub-goals for action sequences","author":"Schmidhuber, J.","year":"1991"},{"key":"atypb32","volume-title":"Reinforcement learning in Markovian and non-Markovian environments","author":"Schmidhuber, J.","year":"1991"},{"key":"atypb33","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1992.4.2.234"},{"key":"atypb34","volume-title":"What's interesting? (Tech. Rep. IDSIA-35-97)","author":"Schmidhuber, J.","year":"1997"},{"key":"atypb35","doi-asserted-by":"crossref","volume-title":"Reinforcement learning with self-modifying policies","author":"Schmidhuber, J.","DOI":"10.1007\/978-1-4615-5529-2_12"},{"key":"atypb36","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007383707642"},{"key":"atypb37","volume-title":"The efficient learning of multiple task sequences","author":"Singh, S.","year":"1992"},{"key":"atypb38","volume-title":"The optimal control of partially observable Markov decision processes","author":"Sondik, E.J.","year":"1971"},{"key":"atypb39","volume-title":"Proceedings of the International Conference on Artificial Neural Networks","author":"Storck, J."},{"key":"atypb40","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"atypb41","volume-title":"Machine learning: Proceedings of the Twelfth International Conference","author":"Sutton, R.S."},{"key":"atypb42","volume-title":"The evolution of mental models","author":"Teller, A.","year":"1994"},{"key":"atypb43","doi-asserted-by":"publisher","DOI":"10.1016\/0921-8890(95)00005-Z"},{"key":"atypb44","volume-title":"Efficient exploration in reinforcement learning (Tech. Rep. No. CMU-CS-92-102)","author":"Thrun, S.","year":"1992"},{"key":"atypb45","volume-title":"Learningfrom delayed rewards","author":"Watkins, C.","year":"1989"},{"key":"atypb46","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"atypb47","volume-title":"Reinforcement learning for the adaptive control of perception and action","author":"Whitehead, S.","year":"1992"},{"key":"atypb48","volume-title":"Machine learning: Proceedings of the Thirteenth International Conference","author":"Wiering, M."},{"key":"atypb49","doi-asserted-by":"publisher","DOI":"10.1162\/evco.1994.2.1.1"},{"key":"atypb50","doi-asserted-by":"publisher","DOI":"10.1162\/evco.1995.3.2.149"},{"key":"atypb51","volume-title":"From animals to animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior","author":"Wilson, S."},{"key":"atypb52","volume-title":"From animals to animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior","author":"Zhao, J."},{"key":"atypb53","volume-title":"Planning in stochastic domains: problem characteristics and approximations (Tech. Rep. No. HKUST-CS96-31)","author":"Zhang, N.L.","year":"1996"}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/105971239700600202","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/105971239700600202","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T16:18:23Z","timestamp":1777393103000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/105971239700600202"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1997,9]]},"references-count":53,"journal-issue":{"issue":"2","published-print":{"date-parts":[[1997,9]]}},"alternative-id":["10.1177\/105971239700600202"],"URL":"https:\/\/doi.org\/10.1177\/105971239700600202","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[1997,9]]}}}