{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T08:24:41Z","timestamp":1743582281579},"publisher-location":"Berlin, Heidelberg","reference-count":19,"publisher":"Springer Berlin Heidelberg","isbn-type":[{"type":"print","value":"9783540439417"},{"type":"electronic","value":"9783540456223"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2002]]},"DOI":"10.1007\/3-540-45622-8_15","type":"book-chapter","created":{"date-parts":[[2007,5,23]],"date-time":"2007-05-23T18:45:20Z","timestamp":1179945920000},"page":"196-211","source":"Crossref","is-referenced-by-count":16,"title":["Model Minimization in Hierarchical Reinforcement Learning"],"prefix":"10.1007","author":[{"given":"Balaraman","family":"Ravindran","sequence":"first","affiliation":[]},{"given":"Andrew G.","family":"Barto","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2002,7,9]]},"reference":[{"key":"15_CR1","unstructured":"C. Boutilier and R. Dearden. Using abstractions for decision theoretic planning with time constraints. In Proceedings of the AAAI-94, pages 1016\u20131022. AAAI, 1994."},{"key":"15_CR2","unstructured":"C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction In Proceedings of International Joint Conference on Artificial Intelligence 14, pages 1104\u20131111, 1995."},{"key":"15_CR3","unstructured":"Steven J. Bradtke and Michael O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In Advances in Neural Information Processing Systems 7. MIT Press, 1995."},{"key":"15_CR4","unstructured":"Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of AAAI-97, pages 106\u2013111. AAAI, 1997."},{"issue":"1\/2","key":"15_CR5","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1007\/BF00625970","volume":"9","author":"E. A. Emerson","year":"1996","unstructured":"E. A. Emerson and A. P. Sistla. Symmetry and model checking. Formal Methods in System Design, 9(1\/2):105\u2013131, 1996.","journal-title":"Formal Methods in System Design"},{"key":"15_CR6","unstructured":"Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimization in markov decision processes. Submitted to Artificial Intelligence, 2001."},{"key":"15_CR7","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/S0004-3702(00)00047-3","volume":"122","author":"R. Givan","year":"2000","unstructured":"Robert Givan, Sonia Leach, and Thomas Dean. Bounded-parameter markov decision processes. Artificial Intelligence, 122:71\u2013109, 2000.","journal-title":"Artificial Intelligence"},{"issue":"2","key":"15_CR8","doi-asserted-by":"publisher","first-page":"562","DOI":"10.1214\/aop\/1176990441","volume":"19","author":"J. Glover","year":"1991","unstructured":"J. Glover. Symmetry groups and translation invariant representations of markov processes. The Annals of Probability, 19(2):562\u2013586, 1991.","journal-title":"The Annals of Probability"},{"key":"15_CR9","volume-title":"Algebraic Structure Theory of Sequential Machines","author":"J. Hartmanis","year":"1966","unstructured":"J. Hartmanis and R. E. Stearns. Algebraic Structure Theory of Sequential Machines. Prentice-Hall, Englewood Cliffs, NJ, 1966."},{"key":"15_CR10","first-page":"285","volume":"3","author":"G. A. Iba","year":"1989","unstructured":"Glenn A. Iba. A heuristic approach to the discovery of macro-operators. Machine Learning, 3:285\u2013317, 1989.","journal-title":"Machine Learning"},{"key":"15_CR11","doi-asserted-by":"publisher","first-page":"424","DOI":"10.1016\/S0019-9958(69)90505-1","volume":"15","author":"J. R. Jump","year":"1969","unstructured":"J. R. Jump. A note on the iterative decomposition of finite automata. Information and Control, 15:424\u2013435, 1969.","journal-title":"Information and Control"},{"issue":"1","key":"15_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/0890-5401(91)90030-6","volume":"94","author":"K. G. Larsen","year":"1991","unstructured":"K. G. Larsen and A. Skou. Bisimulation through probabilistic testing. Information and Computation, 94(1):1\u201328, 1991.","journal-title":"Information and Computation"},{"key":"15_CR13","doi-asserted-by":"crossref","unstructured":"D. Lee and M. Yannakakis. Online minimization of transition systems. In Proceed-ings of 24th Annual ACM Symposium on the Theory of Computing, pages 264\u2013274. ACM, 1992.","DOI":"10.1145\/129712.129738"},{"key":"15_CR14","series-title":"PhD thesis","volume-title":"Temporal Abstraction in Reinforcement Learning","author":"D. Precup","year":"2000","unstructured":"Doina Precup. Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, May2000."},{"key":"15_CR15","volume-title":"Technical Report 01-43","author":"B. Ravindran","year":"2001","unstructured":"B. Ravindran and A. G. Barto. Symmetries and model minimization of markov decision processes. Technical Report 01-43, University of Massachusetts, Amherst, 2001."},{"key":"15_CR16","volume-title":"Reinforcement Learning. An Introduction","author":"R. S. Sutton","year":"1998","unstructured":"Richard S. Sutton and Andrew G. Barto. Reinforcement Learning. An Introduction. MIT Press, Cambridge, MA, 1998."},{"key":"15_CR17","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"R. S. Sutton","year":"1999","unstructured":"Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181\u2013211, 1999.","journal-title":"Artificial Intelligence"},{"key":"15_CR18","series-title":"PhD thesis","volume-title":"Learning from delayed rewards","author":"C. J. C. H. Watkins","year":"1989","unstructured":"C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, England, 1989."},{"key":"15_CR19","unstructured":"M. Zinkevich and T. Balch. Symmetry in markov decision processes and its implications for single agent and multi agent learning. In Proceedings of the 18th International Conference on Machine Learning, pages 632\u2013640, San Francisco, CA, 2001. Morgan Kaufmann."}],"container-title":["Lecture Notes in Computer Science","Abstraction, Reformulation, and Approximation"],"original-title":[],"link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/3-540-45622-8_15","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,28]],"date-time":"2019-04-28T07:31:31Z","timestamp":1556436691000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/3-540-45622-8_15"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2002]]},"ISBN":["9783540439417","9783540456223"],"references-count":19,"URL":"https:\/\/doi.org\/10.1007\/3-540-45622-8_15","relation":{},"ISSN":["0302-9743"],"issn-type":[{"type":"print","value":"0302-9743"}],"subject":[],"published":{"date-parts":[[2002]]}}}