{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T22:39:31Z","timestamp":1768689571646,"version":"3.49.0"},"publisher-location":"Berlin, Heidelberg","reference-count":78,"publisher":"Springer Berlin Heidelberg","isbn-type":[{"value":"9783642276446","type":"print"},{"value":"9783642276453","type":"electronic"}],"license":[{"start":{"date-parts":[[2012,1,1]],"date-time":"2012-01-01T00:00:00Z","timestamp":1325376000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2012,1,1]],"date-time":"2012-01-01T00:00:00Z","timestamp":1325376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012]]},"DOI":"10.1007\/978-3-642-27645-3_9","type":"book-chapter","created":{"date-parts":[[2012,3,5]],"date-time":"2012-03-05T22:18:12Z","timestamp":1330985892000},"page":"293-323","source":"Crossref","is-referenced-by-count":15,"title":["Hierarchical Approaches"],"prefix":"10.1007","author":[{"given":"Bernhard","family":"Hengst","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","reference":[{"key":"9_CR1","unstructured":"Agre, P.E., Chapman, D.: Pengi: an implementation of a theory of activity. In: Proceedings of the Sixth National Conference on Artificial Intelligence, AAAI 1987, vol.\u00a01, pp. 268\u2013272. AAAI Press (1987)"},{"key":"9_CR2","first-page":"131","volume-title":"Machine Intelligence","author":"S. Amarel","year":"1968","unstructured":"Amarel, S.: On representations of problems of reasoning about actions. In: Michie, D. (ed.) Machine Intelligence, vol.\u00a03, pp. 131\u2013171. Edinburgh at the University Press, Edinburgh (1968)"},{"key":"9_CR3","unstructured":"Andre, D., Russell, S.J.: Programmable reinforcement learning agents. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS, pp. 1019\u20131025. MIT Press (2000)"},{"key":"9_CR4","unstructured":"Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: Dechter, R., Kearns, M., Sutton, R.S. (eds.) Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 119\u2013125. AAAI Press (2002)"},{"key":"9_CR5","volume-title":"Design for a Brain: The Origin of Adaptive Behaviour","author":"R. Ashby","year":"1952","unstructured":"Ashby, R.: Design for a Brain: The Origin of Adaptive Behaviour. Chapman & Hall, London (1952)"},{"key":"9_CR6","volume-title":"Introduction to Cybernetics","author":"R. Ashby","year":"1956","unstructured":"Ashby, R.: Introduction to Cybernetics. Chapman & Hall, London (1956)"},{"key":"9_CR7","unstructured":"Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, pp. 438\u2013445 (2004)"},{"key":"9_CR8","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1022140919877","volume":"13","author":"A.G. Barto","year":"2003","unstructured":"Barto, A.G., Mahadevan, S.: Recent advances in hiearchical reinforcement learning. Special Issue on Reinforcement Learning, Discrete Event Systems Journal\u00a013, 41\u201377 (2003)","journal-title":"Special Issue on Reinforcement Learning, Discrete Event Systems Journal"},{"key":"9_CR9","doi-asserted-by":"crossref","DOI":"10.1515\/9781400874668","volume-title":"Adaptive Control Processes: A Guided Tour","author":"R. Bellman","year":"1961","unstructured":"Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)"},{"key":"9_CR10","first-page":"1104","volume-title":"Proceedings of the 14th International Joint Conference on Artificial Intelligence","author":"C. Boutilier","year":"1995","unstructured":"Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol.\u00a02, pp. 1104\u20131111. Morgan Kaufmann Publishers Inc., San Francisco (1995)"},{"key":"9_CR11","unstructured":"Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S.: Decision-theoretic, high-level agent programming in the situation calculus. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 355\u2013362. AAAI Press (2000)"},{"key":"9_CR12","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/S0921-8890(05)80025-9","volume":"6","author":"R.A. Brooks","year":"1990","unstructured":"Brooks, R.A.: Elephants don\u2019t play chess. Robotics and Autonomous Systems\u00a06, 3\u201315 (1990)","journal-title":"Robotics and Autonomous Systems"},{"key":"9_CR13","first-page":"1399","volume-title":"Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010","author":"P.S. Castro","year":"2010","unstructured":"Castro, P.S., Precup, D.: Using bisimulation for policy transfer in mdps. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, vol.\u00a01, pp. 1399\u20131400. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2010)"},{"issue":"1","key":"9_CR14","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1017\/S0140525X97000022","volume":"20","author":"A. Clark","year":"1997","unstructured":"Clark, A., Thornton, C.: Trading spaces: Computation, representation, and the limits of uninformed learning. Behavioral and Brain Sciences\u00a020(1), 57\u201366 (1997)","journal-title":"Behavioral and Brain Sciences"},{"key":"9_CR15","unstructured":"Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), vol.\u00a05 (1992)"},{"key":"9_CR16","unstructured":"Dean, T., Givan, R.: Model minimization in Markov decision processes. In: AAAI\/IAAI, pp. 106\u2013111 (1997)"},{"key":"9_CR17","unstructured":"Dean, T., Lin, S.H.: Decomposition techniques for planning in stochastic domains. Tech. Rep. CS-95-10, Department of Computer Science Brown University (1995)"},{"key":"9_CR18","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1613\/jair.639","volume":"13","author":"T.G. Dietterich","year":"2000","unstructured":"Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research\u00a013, 227\u2013303 (2000)","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9_CR19","doi-asserted-by":"crossref","unstructured":"Digney, B.L.: Learning hierarchical control structures for multiple tasks and changing environments. From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behaviour SAB (1998)","DOI":"10.7551\/mitpress\/3119.003.0050"},{"issue":"11","key":"9_CR20","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1016\/j.robot.2008.08.010","volume":"56","author":"A. Ferrein","year":"2008","unstructured":"Ferrein, A., Lakemeyer, G.: Logic-based robot control in highly dynamic domains. Robot Auton. Syst.\u00a056(11), 980\u2013991 (2008)","journal-title":"Robot Auton. Syst."},{"key":"9_CR21","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1007\/11589990_19","volume-title":"AI 2005: Advances in Artificial Intelligence","author":"R. Fitch","year":"2005","unstructured":"Fitch, R., Hengst, B., \u0161uc, D., Calbert, G., Scholz, J.: Structural Abstraction Experiments in Reinforcement Learning. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol.\u00a03809, pp. 164\u2013175. Springer, Heidelberg (2005)"},{"key":"9_CR22","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1109\/TAC.1978.1101707","volume":"23","author":"J. Forestier","year":"1978","unstructured":"Forestier, J., Varaiya, P.: Multilayer control of large Markov chains. IEEE Tansactions Automatic Control\u00a023, 298\u2013304 (1978)","journal-title":"IEEE Tansactions Automatic Control"},{"key":"9_CR23","unstructured":"Gamow, G., Stern, M.: Puzzle-math. Viking Press (1958)"},{"key":"9_CR24","first-page":"186","volume-title":"Proc. 18th International Conf. on Machine Learning","author":"M. Ghavamzadeh","year":"2001","unstructured":"Ghavamzadeh, M., Mahadevan, S.: Continuous-time hierarchial reinforcement learning. In: Proc. 18th International Conf. on Machine Learning, pp. 186\u2013193. Morgan Kaufmann, San Francisco (2001)"},{"key":"9_CR25","unstructured":"Ghavamzadeh, M., Mahadevan, S.: Hierarchical policy gradient algorithms. In: Marine Environments, pp. 226\u2013233. AAAI Press (2003)"},{"key":"9_CR26","unstructured":"Hauskrecht, M., Meuleau, N., Kaelbling, L.P., Dean, T., Boutilier, C.: Hierarchical solution of Markov decision processes using macro-actions. In: Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 220\u2013229 (1998)"},{"key":"9_CR27","unstructured":"Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the Nineteenth International Conference on Machine Learning, pp. 243\u2013250. Morgan Kaufmann (2002)"},{"key":"9_CR28","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1007\/978-3-540-30115-8_16","volume-title":"Machine Learning: ECML 2004","author":"B. Hengst","year":"2004","unstructured":"Hengst, B.: Model Approximation for HEXQ Hierarchical Reinforcement Learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.\u00a03201, pp. 144\u2013155. Springer, Heidelberg (2004)"},{"key":"9_CR29","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1007\/978-3-540-76928-6_8","volume-title":"AI 2007: Advances in Artificial Intelligence","author":"B. Hengst","year":"2007","unstructured":"Hengst, B.: Safe State Abstraction and Reusable Continuing Subtasks in Hierarchical Reinforcement Learning. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol.\u00a04830, pp. 58\u201367. Springer, Heidelberg (2007)"},{"key":"9_CR30","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1007\/978-3-540-89378-3_14","volume-title":"AI 2008: Advances in Artificial Intelligence","author":"B. Hengst","year":"2008","unstructured":"Hengst, B.: Partial Order Hierarchical Reinforcement Learning. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol.\u00a05360, pp. 138\u2013149. Springer, Heidelberg (2008)"},{"key":"9_CR31","unstructured":"Hernandez, N., Mahadevan, S.: Hierarchical memory-based reinforcement learning. In: Fifteenth International Conference on Neural Information Processing Systems, Denver (2000)"},{"key":"9_CR32","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1007\/978-3-540-68677-4_8","volume-title":"Artificial General Intelligence","author":"M. Hutter","year":"2007","unstructured":"Hutter, M.: Universal algorithmic intelligence: A mathematical top\u2192down approach. In: Artificial General Intelligence, pp. 227\u2013290. Springer, Berlin (2007)"},{"key":"9_CR33","doi-asserted-by":"crossref","unstructured":"Jong, N.K., Stone, P.: Compositional models for reinforcement learning. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2009)","DOI":"10.1007\/978-3-642-04180-8_59"},{"key":"9_CR34","first-page":"2259","volume":"7","author":"A. Jonsson","year":"2006","unstructured":"Jonsson, A., Barto, A.G.: Causal graph based decomposition of factored mdps. Journal of Machine Learning\u00a07, 2259\u20132301 (2006)","journal-title":"Journal of Machine Learning"},{"key":"9_CR35","first-page":"167","volume-title":"Proceedings of the Tenth International Conference Machine Learning","author":"L.P. Kaelbling","year":"1993","unstructured":"Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of the Tenth International Conference Machine Learning, pp. 167\u2013173. Morgan Kaufmann, San Mateo (1993)"},{"key":"9_CR36","first-page":"895","volume-title":"Proceedings of the 20th International Joint Conference on Artifical Intelligence","author":"G. Konidaris","year":"2007","unstructured":"Konidaris, G., Barto, A.G.: Building portable options: skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 895\u2013900. Morgan Kaufmann Publishers Inc., San Francisco (2007)"},{"key":"9_CR37","unstructured":"Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol.\u00a022, pp. 1015\u20131023 (2009)"},{"key":"9_CR38","unstructured":"Konidaris, G., Kuindersma, S., Barto, A.G., Grupen, R.: Constructing skill trees for reinforcement learning agents from demonstration trajectories. In: Advances in Neural Information Processing Systems NIPS, vol.\u00a023 (2010)"},{"key":"9_CR39","volume-title":"Learning to Solve Problems by Searching for Macro-Operators","author":"R.E. Korf","year":"1985","unstructured":"Korf, R.E.: Learning to Solve Problems by Searching for Macro-Operators. Pitman Publishing Inc., Boston (1985)"},{"key":"9_CR40","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1016\/S0743-1066(96)00121-5","volume":"31","author":"H. Levesque","year":"1997","unstructured":"Levesque, H., Reiter, R., Lesp\u00e9rance, Y., Lin, F., Scherl, R.: Golog: A logic programming language for dynamic domains. Journal of Logic Programming\u00a031, 59\u201384 (1997)","journal-title":"Journal of Logic Programming"},{"key":"9_CR41","doi-asserted-by":"crossref","unstructured":"Mahadevan, S.: Representation discovery in sequential descision making. In: 24th Conference on Artificial Intelligence (AAAI), Atlanta, July 11-15 (2010)","DOI":"10.1609\/aaai.v24i1.7766"},{"key":"9_CR42","unstructured":"Marthi, B., Russell, S., Latham, D., Guestrin, C.: Concurrent hierarchical reinforcement learning. In: Proc. IJCAI 2005 Edinburgh, Scotland (2005)"},{"key":"9_CR43","unstructured":"Marthi, B., Kaelbling, L., Lozano-Perez, T.: Learning hierarchical structure in policies. In: NIPS 2007 Workshop on Hierarchical Organization of Behavior (2007)"},{"key":"9_CR44","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1007\/3-540-45622-8_34","volume-title":"Abstraction, Reformulation, and Approximation","author":"A. McGovern","year":"2002","unstructured":"McGovern, A.: Autonomous Discovery of Abstractions Through Interaction with an Environment. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol.\u00a02371, pp. 338\u2013339. Springer, Heidelberg (2002)"},{"key":"9_CR45","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1007\/s10994-008-5061-y","volume":"73","author":"N. Mehta","year":"2008","unstructured":"Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Mach. Learn.\u00a073, 289\u2013312 (2008a), doi:10.1007\/s10994-008-5061-y","journal-title":"Mach. Learn."},{"key":"9_CR46","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1145\/1390156.1390238","volume-title":"Proceedings of the 25th International Conference on Machine Learning, ICML 2008","author":"N. Mehta","year":"2008","unstructured":"Mehta, N., Ray, S., Tadepalli, P., Dietterich, T.: Automatic discovery and transfer of maxq hierarchies. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 648\u2013655. ACM, New York (2008b)"},{"key":"9_CR47","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1007\/3-540-36755-1_25","volume-title":"Machine Learning: ECML 2002","author":"I. Menache","year":"2002","unstructured":"Menache, I., Mannor, S., Shimkin, N.: Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol.\u00a02430, pp. 295\u2013305. Springer, Heidelberg (2002)"},{"key":"9_CR48","unstructured":"Moerman, W.: Hierarchical reinforcement learning: Assignment of behaviours to subpolicies by self-organization. PhD thesis, Cognitive Artificial Intelligence, Utrecht University (2009)"},{"key":"9_CR49","first-page":"1316","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence","author":"A. Moore","year":"1999","unstructured":"Moore, A., Baird, L., Kaelbling, L.P.: Multi-value-functions: Efficient automatic action hierarchies for multiple goal mdps. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1316\u20131323. Morgan Kaufmann, San Francisco (1999)"},{"key":"9_CR50","first-page":"1175","volume-title":"Proceedings of the 21st International Jont Conference on Artifical Intelligence","author":"J. Mugan","year":"2009","unstructured":"Mugan, J., Kuipers, B.: Autonomously learning an action hierarchy using a learned qualitative state representation. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1175\u20131180. Morgan Kaufmann Publishers Inc., San Francisco (2009)"},{"key":"9_CR51","first-page":"753","volume-title":"Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009","author":"G. Neumann","year":"2009","unstructured":"Neumann, G., Maass, W., Peters, J.: Learning complex motions by sequencing simpler motion templates. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 753\u2013760. ACM, New York (2009)"},{"key":"9_CR52","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1613\/jair.30","volume":"1","author":"N.J. Nilsson","year":"1994","unstructured":"Nilsson, N.J.: Teleo-reactive programs for agent control. Journal of Artificial Intelligence Research\u00a01, 139\u2013158 (1994)","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9_CR53","unstructured":"Osentoski, S., Mahadevan, S.: Basis function construction for hierarchical reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, vol. 1, pp. 747\u2013754. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2010)"},{"key":"9_CR54","unstructured":"Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: NIPS (1997)"},{"key":"9_CR55","unstructured":"Parr, R.E.: Hierarchical control and learning for Markov decision processes. PhD thesis, University of California at Berkeley (1998)"},{"key":"9_CR56","unstructured":"Pineau, J., Thrun, S.: An integrated approach to hierarchy and abstraction for pomdps. CMU Technical Report: CMU-RI-TR-02-21 (2002)"},{"key":"9_CR57","doi-asserted-by":"crossref","unstructured":"Polya, G.: How to Solve It: A New Aspect of Mathematical Model. Princeton University Press (1945)","DOI":"10.1515\/9781400828678"},{"key":"9_CR58","unstructured":"Precup, D., Sutton, R.S.: Multi-time models for temporally abstract planning. In: Advances in Neural Information Processing Systems, vol.\u00a010, pp. 1050\u20131056. MIT Press (1997)"},{"key":"9_CR59","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"M.L. Puterman","year":"1994","unstructured":"Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Whiley & Sons, Inc., New York (1994)"},{"key":"9_CR60","unstructured":"Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi Markov decision processes. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003 (2003)"},{"key":"9_CR61","first-page":"472","volume-title":"Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence","author":"K. Rohanimanesh","year":"2001","unstructured":"Rohanimanesh, K., Mahadevan, S.: Decision-theoretic planning with concurrent temporally extended actions. In: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pp. 472\u2013479. Morgan Kaufmann Publishers Inc., San Francisco (2001)"},{"key":"9_CR62","first-page":"720","volume-title":"ICML 2005: Proceedings of the 22nd international conference on Machine learning","author":"K. Rohanimanesh","year":"2005","unstructured":"Rohanimanesh, K., Mahadevan, S.: Coarticulation: an approach for generating concurrent plans in Markov decision processes. In: ICML 2005: Proceedings of the 22nd international conference on Machine learning, pp. 720\u2013727. ACM Press, New York (2005)"},{"key":"9_CR63","volume-title":"Artificial Intelligence: A Modern Approach","author":"S. Russell","year":"1995","unstructured":"Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (1995)"},{"key":"9_CR64","unstructured":"Ryan, M.R.K.: Hierarchical Decision Making. In: Handbook of Learning and Approximate Dynamic Programming. IEEE Press Series on Computational Intelligence. Wiley-IEEE Press (2004)"},{"key":"9_CR65","series-title":"Lecture Notes in Artificial Intelligence","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1007\/3-540-44960-4_11","volume-title":"Inductive Logic Programming","author":"M.D. Reid","year":"2000","unstructured":"Reid, M.D., Ryan, M.: Using ILP to Improve Planning in Hierarchical Reinforcement Learning. In: Cussens, J., Frisch, A.M. (eds.) ILP 2000. LNCS (LNAI), vol.\u00a01866, pp. 174\u2013190. Springer, Heidelberg (2000)"},{"key":"9_CR66","doi-asserted-by":"crossref","unstructured":"Si, J., Barto, A.G., Powell, W.B., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. IEEE Press Series on Computational Intelligence. Wiley-IEEE Press (2004)","DOI":"10.1109\/9780470544785"},{"key":"9_CR67","volume-title":"The Sciences of the Artificial","author":"H.A. Simon","year":"1996","unstructured":"Simon, H.A.: The Sciences of the Artificial, 3rd edn. MIT Press, Cambridge (1996)","edition":"3"},{"key":"9_CR68","doi-asserted-by":"crossref","unstructured":"\u015eim\u015fek, O., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of theTwenty-First International Conference on Machine Learning, ICML 2004 (2004)","DOI":"10.1145\/1015330.1015353"},{"key":"9_CR69","unstructured":"Singh, S.: Reinforcement learning with a hierarchy of abstract models. In: Proceedings of the Tenth National Conference on Artificial Intelligence (1992)"},{"key":"9_CR70","unstructured":"Stone, P.: Layered learning in multi-agent systems. PhD, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1998)"},{"key":"9_CR71","unstructured":"Strehl, A.L., Diuk, C., Littman, M.L.: Efficient structure learning in factored-state mdps. In: Proceedings of the 22nd National Conference on Artificial Intelligence, vol.\u00a01, pp. 645\u2013650. AAAI Press (2007)"},{"issue":"1-2","key":"9_CR72","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"R.S. Sutton","year":"1999","unstructured":"Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence\u00a0112(1-2), 181\u2013211 (1999)","journal-title":"Artificial Intelligence"},{"issue":"1","key":"9_CR73","first-page":"1633","volume":"10","author":"M.E. Taylor","year":"2009","unstructured":"Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research\u00a010(1), 1633\u20131685 (2009)","journal-title":"Journal of Machine Learning Research"},{"key":"9_CR74","unstructured":"Theocharous, G., Kaelbling, L.P.: Approximate planning in POMDPS with macro-actions. In: Advances in Neural Information Processing Systems 16 (NIPS-2003) (2004) (to appear)"},{"key":"9_CR75","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"S. Thrun","year":"1995","unstructured":"Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems (NIPS), vol.\u00a07. MIT Press, Cambridge (1995)"},{"key":"9_CR76","doi-asserted-by":"crossref","unstructured":"Utgoff, P.E., Stracuzzi, D.J.: Many-layered learning. In: Neural Computation. MIT Press Journals (2002)","DOI":"10.1162\/08997660260293319"},{"key":"9_CR77","unstructured":"Watkins CJCH, Learning from delayed rewards. PhD thesis, King\u2019s College (1989)"},{"key":"9_CR78","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1177\/105971239700600202","volume":"6","author":"M. Wiering","year":"1997","unstructured":"Wiering, M., Schmidhuber, J.: HQ-learning. Adaptive Behavior\u00a06, 219\u2013246 (1997)","journal-title":"Adaptive Behavior"}],"container-title":["Adaptation, Learning, and Optimization","Reinforcement Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-642-27645-3_9","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,21]],"date-time":"2024-04-21T03:12:45Z","timestamp":1713669165000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-642-27645-3_9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012]]},"ISBN":["9783642276446","9783642276453"],"references-count":78,"URL":"https:\/\/doi.org\/10.1007\/978-3-642-27645-3_9","relation":{},"ISSN":["1867-4534","1867-4542"],"issn-type":[{"value":"1867-4534","type":"print"},{"value":"1867-4542","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012]]}}}