{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T10:47:37Z","timestamp":1752230857683},"publisher-location":"Berlin, Heidelberg","reference-count":93,"publisher":"Springer Berlin Heidelberg","isbn-type":[{"type":"print","value":"9783540415978"},{"type":"electronic","value":"9783540445654"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2000]]},"DOI":"10.1007\/3-540-44565-x_10","type":"book-chapter","created":{"date-parts":[[2007,8,11]],"date-time":"2007-08-11T09:48:14Z","timestamp":1186825694000},"page":"213-240","source":"Crossref","is-referenced-by-count":7,"title":["Sequential Decision Making Based on Direct Search"],"prefix":"10.1007","author":[{"given":"J\u00fcrgen","family":"Schmidhuber","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2001,12,7]]},"reference":[{"key":"10_CR1","unstructured":"Andre, D. (1998). Learning hierarchical behaviors. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR2","doi-asserted-by":"crossref","DOI":"10.1007\/BFb0055923","volume-title":"Genetic Programming \u2014 An Introduction","author":"W. Banzhaf","year":"1998","unstructured":"Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic Programming \u2014 An Introduction. Morgan Kaufmann Publishers, San Francisco, CA, USA."},{"key":"10_CR3","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","volume":"SMC-13","author":"A. G. Barto","year":"1983","unstructured":"Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834\u2013846.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"10_CR4","volume-title":"Toward code evolution by artificial economies","author":"E. B. Baum","year":"1998","unstructured":"Baum, E. B., & Durdanovic, I. (1998). Toward code evolution by artificial economies. Tech. rep., NEC Research Institute, Princeton, NJ. Extension of a paper in Proc. 13th ICML\u20191996, Morgan Kaufmann, CA."},{"key":"10_CR5","doi-asserted-by":"crossref","unstructured":"Bellman, R. (1961). Adaptive Control Processes. Princeton University Press.","DOI":"10.1515\/9781400874668"},{"key":"10_CR6","volume-title":"Neuro-dynamic Programming","author":"D. P. Bertsekas","year":"1996","unstructured":"Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic Programming. Athena Scientific, Belmont, MA."},{"key":"10_CR7","unstructured":"Bowling, M., & Veloso, M. (1998). Bounding the suboptimality of reusing sub-problems. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR8","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1145\/321495.321506","volume":"16","author":"G. Chaitin","year":"1969","unstructured":"Chaitin, G. (1969). On the length of programs for computing finite binary sequences: statistical considerations. Journal of the ACM, 16, 145\u2013159.","journal-title":"Journal of the ACM"},{"key":"10_CR9","unstructured":"Coelho, J., & Grupen, R. A. (1998). Control abstractions as state representation. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR10","first-page":"679","volume-title":"Advances in Neural Information Processing Systems","author":"D. A. Cohn","year":"1994","unstructured":"Cohn, D. A. (1994). Neural network exploration using optimal experiment design. In Cowan, J., Tesauro, G., & Alspector, J. (Eds.), Advances in Neural Information Processing Systems 6, pp. 679\u2013686. San Mateo, CA: Morgan Kaufmann."},{"key":"10_CR11","volume-title":"Proceedings of an International Conference on Genetic Algorithms and Their Applications","author":"N. L. Cramer","year":"1985","unstructured":"Cramer, N. L. (1985). A representation for the adaptive generation of simple sequential programs. In Grefenstette, J. (Ed.), Proceedings of an International Conference on Genetic Algorithms and Their Applications Hillsdale NJ. Lawrence Erlbaum Associates."},{"key":"10_CR12","first-page":"271","volume-title":"Advances in Neural Information Processing Systems","author":"P. Dayan","year":"1993","unstructured":"Dayan, P., & Hinton, G. (1993). Feudal reinforcement learning. In Lippman, D. S., Moody, J. E., & Touretzky, D. S. (Eds.), Advances in Neural Information Processing Systems 5, pp. 271\u2013278. San Mateo, CA: Morgan Kaufmann."},{"key":"10_CR13","first-page":"5","volume":"25","author":"P. Dayan","year":"1996","unstructured":"Dayan, P., & Sejnowski, T. J. (1996). Exloration bonuses and dual control. Machine Learning, 25, 5\u201322.","journal-title":"Machine Learning"},{"key":"10_CR14","unstructured":"Dickmanns, D., Schmidhuber, J., & Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut f\u00fcr Informatik, Lehrstuhl Prof. Radig, Technische Universit\u00e4t M\u00fcnchen.."},{"key":"10_CR15","doi-asserted-by":"crossref","unstructured":"Digney, B. (1996). Emergent hierarchical control structures: Learning reactive\/hierarchical relationships in reinforcement environments. In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., & Wilson, S. W. (Eds.), From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pp. 363\u2013372. MIT Press, Bradford Books.","DOI":"10.7551\/mitpress\/3118.003.0044"},{"key":"10_CR16","first-page":"III-145","volume-title":"World Congress on Neural Networks","author":"M. Eldracher","year":"1993","unstructured":"Eldracher, M., & Baginski, B. (1993). Neural subgoal generation using backpropagation. In Lendaris, G. G., Grossberg, S., & Kosko, B. (Eds.), World Congress on Neural Networks, pp. III-145\u2013III-148. Lawrence Erlbaum Associates, Inc., Publishers, Hillsdale."},{"key":"10_CR17","unstructured":"Fedorov, V. V. (1972). Theory of optimal experiments. Academic Press."},{"key":"10_CR18","volume-title":"Wiley-Interscience series in systems and optimization","author":"J. C. Gittins","year":"1989","unstructured":"Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY."},{"key":"10_CR19","unstructured":"Harada, D., & Russell, S. (1998). Meta-level reinforcement learning. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR20","first-page":"473","volume-title":"Advances in Neural Information Processing Systems","author":"S. Hochreiter","year":"1997","unstructured":"Hochreiter, S., & Schmidhuber, J. (1997). LSTM can solve hard long time lag problems. In Mozer, M. C., Jordan, M. I., & Petsche, T. (Eds.), Advances in Neural Information Processing Systems 9, pp. 473\u2013479. MIT Press, Cambridge MA."},{"key":"10_CR21","volume-title":"Adaptation in Natural and Artificial Systems","author":"J. H. Holland","year":"1975","unstructured":"Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor."},{"key":"10_CR22","unstructured":"Holland, J. H. (1985). Properties of the bucket brigade. In Proceedings of an International Conference on Genetic Algorithms. Hillsdale, NJ."},{"key":"10_CR23","unstructured":"Huber, M., & Grupen, R. A. (1998). Learning robot control using control policies as abstract actions. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR24","doi-asserted-by":"crossref","unstructured":"Humphrys, M. (1996). Action selection methods using reinforcement learning. In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., & Wilson, S. W. (Eds.), From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pp. 135\u2013144. MIT Press, Bradford Books.","DOI":"10.7551\/mitpress\/3118.003.0018"},{"issue":"1","key":"10_CR25","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1109\/72.80299","volume":"2","author":"J. Hwang","year":"1991","unstructured":"Hwang, J., Choi, J., Oh, S., & II, R. J. M. (1991). Query-based learning applied to partially trained multilayer perceptrons. IEEE Transactions on Neural Networks, 2(1), 131\u2013136.","journal-title":"IEEE Transactions on Neural Networks"},{"key":"10_CR26","first-page":"345","volume-title":"Advances in Neural Information Processing Systems","author":"T. Jaakkola","year":"1995","unstructured":"Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In Tesauro, G., Touretzky, D. S., & Leen, T. K. (Eds.), Advances in Neural Information Processing Systems 7, pp. 345\u2013352. MIT Press, Cambridge MA."},{"key":"10_CR27","first-page":"430","volume-title":"Advances in Neural Information Processing Systems","author":"A. Juels","year":"1996","unstructured":"Juels, A., & Wattenberg, M. (1996). Stochastic hillclimbing as a baseline method for evaluating genetic algorithms. In Touretzky, D. S., Mozer, M. C., & Hasselmo, M. E. (Eds.), Advances in Neural Information Processing Systems, Vol. 8, pp. 430\u2013436. The MIT Press, Cambridge, MA."},{"key":"10_CR28","doi-asserted-by":"crossref","unstructured":"Kaelbling, L. (1993). Learning in Embedded Systems. MIT Press.","DOI":"10.7551\/mitpress\/4168.001.0001"},{"key":"10_CR29","volume-title":"Planning and acting in partially observable stochastic domains","author":"L. Kaelbling","year":"1995","unstructured":"Kaelbling, L., Littman, M., & Cassandra, A. (1995). Planning and acting in partially observable stochastic domains. Tech. rep., Brown University, Providence RI."},{"key":"10_CR30","volume-title":"Advances in Neural Information Processing Systems","author":"M. Kearns","year":"1999","unstructured":"Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Kearns, M., Solla, S. A., & Cohn, D. (Eds.), Advances in Neural Information Processing Systems 12. MIT Press, Cambridge MA."},{"key":"10_CR31","unstructured":"Kirchner, F. (1998). Q-learning of complex behaviors on a six-legged walking machine. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR32","first-page":"228","volume":"22","author":"S. Koenig","year":"1996","unstructured":"Koenig, S., & Simmons, R. G. (1996). The effect of representation and knowedge on goal-directed exploration with reinforcement learnign algorithm. Machine Learning, 22, 228\u2013250.","journal-title":"Machine Learning"},{"key":"10_CR33","first-page":"1","volume":"1","author":"A. Kolmogorov","year":"1965","unstructured":"Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1, 1\u201311.","journal-title":"Problems of Information Transmission"},{"key":"10_CR34","first-page":"121","volume":"10","author":"P. Koumoutsakos","year":"1998","unstructured":"Koumoutsakos P., F. J., & D., P. (1998). Evolution strategies for parameter optimization in jet flow control. Center for Turbulence Research \u2014 Proceedings of the Summer program 1998, 10, 121\u2013132.","journal-title":"Center for Turbulence Research \u2014 Proceedings of the Summer program 1998"},{"key":"10_CR35","doi-asserted-by":"crossref","unstructured":"Lenat, D. (1983). Theory formation by heuristic search. Machine Learning, 21.","DOI":"10.1016\/S0004-3702(83)80004-6"},{"issue":"3","key":"10_CR36","first-page":"265","volume":"9","author":"L. A. Levin","year":"1973","unstructured":"Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265\u2013266.","journal-title":"Problems of Information Transmission"},{"key":"10_CR37","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/S0019-9958(84)80060-1","volume":"61","author":"L. A. Levin","year":"1984","unstructured":"Levin, L. A. (1984). Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61, 15\u201337.","journal-title":"Information and Control"},{"key":"10_CR38","doi-asserted-by":"crossref","unstructured":"Li, M., & Vit\u00e1nyi, P. M. B. (1993). An Introduction to Kolmogorov Complexity and its Applications. Springer.","DOI":"10.1007\/978-1-4757-3860-5"},{"key":"10_CR39","volume-title":"Reinforcement Learning for Robots Using Neural Networks","author":"L. Lin","year":"1993","unstructured":"Lin, L. (1993). Reinforcement Learning for Robots Using Neural Networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh."},{"key":"10_CR40","unstructured":"Littman, M. (1996). Algorithms for Sequential Decision Making. Ph.D. thesis, Brown University."},{"key":"10_CR41","first-page":"362","volume-title":"Machine Learning: Proceedings of the Twelfth International Conference","author":"M. Littman","year":"1995","unstructured":"Littman, M., Cassandra, A., & Kaelbling, L. (1995). Learning policies for partially observable environments: Scaling up. In Prieditis, A., & Russell, S. (Eds.), Machine Learning: Proceedings of the Twelfth International Conference, pp. 362\u2013370. Morgan Kaufmann Publishers, San Francisco, CA."},{"issue":"2","key":"10_CR42","first-page":"550","volume":"4","author":"D. J. C. MacKay","year":"1992","unstructured":"MacKay, D. J. C. (1992). Information-based objective functions for active data selection. Neural Computation, 4(2), 550\u2013604.","journal-title":"Neural Computation"},{"key":"10_CR43","doi-asserted-by":"crossref","unstructured":"McCallum, R. A. (1996). Learning to use selective attention and short-term memory in sequential tasks. In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., & Wilson, S. W. (Eds.), From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pp. 315\u2013324. MIT Press, Bradford Books.","DOI":"10.7551\/mitpress\/3118.003.0039"},{"key":"10_CR44","unstructured":"McGovern, A. (1998). acquire-macros: An algorithm for automatically learning macro-action. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR45","first-page":"103","volume":"13","author":"A. Moore","year":"1993","unstructured":"Moore, A., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103\u2013130.","journal-title":"Machine Learning"},{"key":"10_CR46","unstructured":"Moore, A. W., Baird, L., & Kaelbling, L. P. (1998). Multi-value-functions: Efficient automatic action hierarchies for multiple goal mdps. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR47","first-page":"1135","volume-title":"Advances in Neural Information Processing Systems","author":"M. Plutowski","year":"1994","unstructured":"Plutowski, M., Cottrell, G., & White, H. (1994). Learning Mackey-Glass from 25 examples, plus or minus 2. In Cowan, J., Tesauro, G., & Alspector, J. (Eds.), Advances in Neural Information Processing Systems 6, pp. 1135\u20131142. San Mateo, CA: Morgan Kaufmann."},{"key":"10_CR48","unstructured":"Ray, T. S. (1992). An approach to the synthesis of life. In Langton, C., Taylor, C., Farmer, J. D., & Rasmussen, S. (Eds.), Artificial Life II, pp. 371\u2013408. Addison Wesley Publishing Company."},{"key":"10_CR49","unstructured":"Rechenberg, I. (1971). Evolutionsstrategie-Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Dissertation.. Published 1973 by Fromman-Holzboog."},{"key":"10_CR50","doi-asserted-by":"crossref","unstructured":"Ring, M. B. (1991). Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies. In Birnbaum, L., & Collins, G. (Eds.), Machine Learning: Proceedings of the Eighth International Workshop, pp. 343\u2013347. Morgan Kaufmann.","DOI":"10.1016\/B978-1-55860-200-7.50071-4"},{"key":"10_CR51","unstructured":"Ring, M. B. (1993). Learning sequential tasks by incrementally adding higher orders. In S. J. Hanson, GIles J. D. C., & Giles, C. L. (Eds.), Advances in Neural Information Processing Systems 5, pp. 115\u2013122. Morgan Kaufmann."},{"key":"10_CR52","volume-title":"Continual Learning in Reinforcement Environments","author":"M. B. Ring","year":"1994","unstructured":"Ring, M. B. (1994). Continual Learning in Reinforcement Environments. Ph.D. thesis, University of Texas at Austin, Austin, Texas 78712."},{"issue":"2","key":"10_CR53","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1162\/evco.1997.5.2.123","volume":"5","author":"R. P. SaIlustowicz","year":"1997","unstructured":"SaIlustowicz, R. P., & Schmidhuber, J. (1997). Probabilistic incremental program evolution. Evolutionary Computation, 5(2), 123\u2013141.","journal-title":"Evolutionary Computation"},{"key":"10_CR54","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1147\/rd.33.0210","volume":"3","author":"A. L. Samuel","year":"1959","unstructured":"Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3, 210\u2013229.","journal-title":"IBM Journal on Research and Development"},{"key":"10_CR55","unstructured":"Schmidhuber, J. (1987). Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Institut f\u00fcr Informatik, Technische Universit\u00e4t M\u00fcnchen.."},{"issue":"4","key":"10_CR56","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1080\/09540098908915650","volume":"1","author":"J. Schmidhuber","year":"1989","unstructured":"Schmidhuber, J. (1989). A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4), 403\u2013412.","journal-title":"Connection Science"},{"key":"10_CR57","doi-asserted-by":"publisher","first-page":"1458","DOI":"10.1109\/IJCNN.1991.170605","volume":"2","author":"J. Schmidhuber","year":"1991","unstructured":"Schmidhuber, J. (1991a). Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, Vol. 2, pp. 1458\u20131463. IEEE.","journal-title":"Proc. International Joint Conference on Neural Networks, Singapore"},{"key":"10_CR58","first-page":"967","volume-title":"Artificial Neural Networks","author":"J. Schmidhuber","year":"1991","unstructured":"Schmidhuber, J. (1991b). Learning to generate sub-goals for action sequences. In Kohonen, T., M\u00e4kisara, K., Simula, O., & Kangas, J. (Eds.), Artificial Neural Networks, pp. 967\u2013972. Elsevier Science Publishers B.V., North-Holland."},{"key":"10_CR59","first-page":"500","volume-title":"Advances in Neural Information Processing Systems","author":"J. Schmidhuber","year":"1991","unstructured":"Schmidhuber, J. (1991c). Reinforcement learning in Markovian and non-Markovian environments. In Lippman, D. S., Moody, J. E., & Touretzky, D. S. (Eds.), Advances in Neural Information Processing Systems 3, pp. 500\u2013506. San Mateo, CA: Morgan Kaufmann."},{"key":"10_CR60","first-page":"488","volume-title":"Machine Learning: Proceedings of the Twelfth International Conference","author":"J. Schmidhuber","year":"1995","unstructured":"Schmidhuber, J. (1995). Discovering solutions with low Kolmogorov complexity and high generalization capability. In Prieditis, A., & Russell, S. (Eds.), Machine Learning: Proceedings of the Twelfth International Conference, pp. 488\u2013496. Morgan Kaufmann Publishers, San Francisco, CA."},{"issue":"5","key":"10_CR61","doi-asserted-by":"publisher","first-page":"857","DOI":"10.1016\/S0893-6080(96)00127-X","volume":"10","author":"J. Schmidhuber","year":"1997","unstructured":"Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5), 857\u2013873.","journal-title":"Neural Networks"},{"key":"10_CR62","first-page":"1612","volume-title":"Congress on Evolutionary Computation","author":"J. Schmidhuber","year":"1999","unstructured":"Schmidhuber, J. (1999). Artificial curiosity based on discovering novel algorithmic predictability through coevolution. In Angeline, P., Michalewicz, Z., Schoenauer, M., Yao, X., & Zalzala, Z. (Eds.), Congress on Evolutionary Computation, pp. 1612\u20131618. IEEE Press, Piscataway, NJ."},{"issue":"4","key":"10_CR63","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1162\/neco.1993.5.4.625","volume":"5","author":"J. Schmidhuber","year":"1993","unstructured":"Schmidhuber, J., & Prelinger, D. (1993). Discovering predictable classifications. Neural Computation, 5(4), 625\u2013635.","journal-title":"Neural Computation"},{"key":"10_CR64","first-page":"119","volume-title":"AAAI Spring Symposium on Search under Uncertain and Incomplete Information, Stanford Univ.","author":"J. Schmidhuber","year":"1999","unstructured":"Schmidhuber, J., & Zhao, J. (1999). Direct policy search and uncertain policy evaluation. In AAAI Spring Symposium on Search under Uncertain and Incomplete Information, Stanford Univ., pp. 119\u2013124. American Association for Artificial Intelligence, Menlo Park, Calif."},{"key":"10_CR65","doi-asserted-by":"crossref","unstructured":"Schmidhuber, J., Zhao, J., & Schraudolph, N. (1997a). Reinforcement learning with self-modifying policies. In Thrun, S., & Pratt, L. (Eds.), Learning to learn, pp. 293\u2013309. Kluwer.","DOI":"10.1007\/978-1-4615-5529-2_12"},{"key":"10_CR66","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1023\/A:1007383707642","volume":"28","author":"J. Schmidhuber","year":"1997","unstructured":"Schmidhuber, J., Zhao, J., & Wiering, M. (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28, 105\u2013130.","journal-title":"Machine Learning"},{"key":"10_CR67","volume-title":"Numerische Optimierung von Computer-Modellen","author":"H. P. Schwefel","year":"1974","unstructured":"Schwefel, H. P. (1974). Numerische Optimierung von Computer-Modellen. Dissertation.. Published 1977 by Birkh\u00e4user, Basel."},{"key":"10_CR68","unstructured":"Schwefel, H. P. (1995). Evolution and Optimum Seeking. Wiley Interscience."},{"key":"10_CR69","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"XXVII","author":"C. E. Shannon","year":"1948","unstructured":"Shannon, C. E. (1948). A mathematical theory of communication (parts I and II). Bell System Technical Journal, XXVII, 379\u2013423.","journal-title":"Bell System Technical Journal"},{"key":"10_CR70","first-page":"251","volume-title":"Advances in Neural Information Processing Systems","author":"S. Singh","year":"1992","unstructured":"Singh, S. (1992). The efficient learning of multiple task sequences. In Moody, J., Hanson, S., & Lippman, R. (Eds.), Advances in Neural Information Processing Systems 4, pp. 251\u2013258 San Mateo, CA. Morgan Kaufmann."},{"key":"10_CR71","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/S0019-9958(64)90223-2","volume":"7","author":"R. Solomonoff","year":"1964","unstructured":"Solomonoff, R. (1964). A formal theory of inductive inference. Part I. Information and Control, 7, 1\u201322.","journal-title":"Information and Control"},{"key":"10_CR72","doi-asserted-by":"crossref","unstructured":"Solomonoff, R. (1986). An application of algorithmic probability to problems in artificial intelligence. In Kanal, L. N., & Lemmer, J. F. (Eds.), Uncertainty in Artificial Intelligence, pp. 473\u2013491. Elsevier Science Publishers.","DOI":"10.1016\/B978-0-444-70058-2.50040-1"},{"key":"10_CR73","first-page":"159","volume-title":"Proceedings of the International Conference on Artificial Neural Networks, Paris","author":"J. Storck","year":"1995","unstructured":"Storck, J., Hochreiter, S., & Schmidhuber, J. (1995). Reinforcement driven information acquisition in non-deterministic environments. In Proceedings of the International Conference on Artificial Neural Networks, Paris, Vol. 2, pp. 159\u2013164. EC2 & Cie, Paris."},{"key":"10_CR74","doi-asserted-by":"crossref","unstructured":"Sun, R., & Sessions, C. (2000). Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors. IEEE Transactions on Systems, Man, and Cybernetics: Part B Cybernetics, 30(3).","DOI":"10.1109\/3477.846230"},{"key":"10_CR75","first-page":"9","volume":"3","author":"R. S. Sutton","year":"1988","unstructured":"Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9\u201344.","journal-title":"Machine Learning"},{"key":"10_CR76","first-page":"531","volume-title":"Machine Learning: Proceedings of the Twelfth International Conference","author":"R. S. Sutton","year":"1995","unstructured":"Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Prieditis, A., & Russell, S. (Eds.), Machine Learning: Proceedings of the Twelfth International Conference, pp. 531\u2013539. Morgan Kaufmann Publishers, San Francisco, CA."},{"key":"10_CR77","unstructured":"Sutton, R. S., & Pinette, B. (1985). The learning of world models by connectionist networks. Proceedings of the 7th Annual Conference of the Cognitive Science Society, 54\u201364."},{"key":"10_CR78","unstructured":"Sutton, R. S., Singh, S., Precup, D., & Ravindran, B. (1999). Improved switching among temporally abstract actions. In Advances in Neural Information Processing Systems 11. MIT Press. To appear."},{"key":"10_CR79","unstructured":"Teller, A. (1994). The evolution of mental models. In Kenneth E. Kinnear, J. (Ed.), Advances in Genetic Programming, pp. 199\u2013219. MIT Press."},{"issue":"2","key":"10_CR80","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1162\/neco.1994.6.2.215","volume":"6","author":"G. Tesauro","year":"1994","unstructured":"Tesauro, G. (1994). TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2), 215\u2013219.","journal-title":"Neural Computation"},{"issue":"4","key":"10_CR81","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/0921-8890(95)00005-Z","volume":"15","author":"C. Tham","year":"1995","unstructured":"Tham, C. (1995). Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems, 15(4), 247\u2013274.","journal-title":"Robotics and Autonomous Systems"},{"key":"10_CR82","first-page":"531","volume-title":"Advances in Neural Information Processing Systems","author":"S. Thrun","year":"1992","unstructured":"Thrun, S., & M\u00f6ller, K. (1992). Active exploration in dynamic environments. In Lippman, D. S., Moody, J. E., & Touretzky, D. S. (Eds.), Advances in Neural Information Processing Systems 4, pp. 531\u2013538. San Mateo, CA: Morgan Kaufmann."},{"key":"10_CR83","unstructured":"Wang, G., & Mahadevan, S. (1998). A greedy divide-and-conquer approach to optimizing large manufacturing systems using reinforcement learning. In NIPS\u201998 Workshop on Abstraction and Hierarchy in Reinforcement Learning."},{"key":"10_CR84","first-page":"279","volume":"8","author":"C. J. C. H. Watkins","year":"1992","unstructured":"Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279\u2013292.","journal-title":"Machine Learning"},{"key":"10_CR85","volume-title":"Learning from Delayed Rewards","author":"C. Watkins","year":"1989","unstructured":"Watkins, C. (1989). Learning from Delayed Rewards. Ph.D. thesis, King\u2019s College, Oxford."},{"key":"10_CR86","first-page":"1335","volume":"2","author":"G. Weiss","year":"1994","unstructured":"Weiss, G. (1994). Hierarchical chunking in classifier systems. In Proceedings of the 12th National Conference on Artificial Intelligence, Vol. 2, pp. 1335\u20131340. AAAIPress\/The MIT Press.","journal-title":"Proceedings of the 12th National Conference on Artificial Intelligence"},{"key":"10_CR87","doi-asserted-by":"crossref","unstructured":"Weiss, G., & Sen, S. (Eds.). (1996). Adaption and Learning in Multi-Agent Systems. LNAI 1042, Springer.","DOI":"10.1007\/3-540-60923-7"},{"issue":"2","key":"10_CR88","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1177\/105971239700600202","volume":"6","author":"M. Wiering","year":"1998","unstructured":"Wiering, M., & Schmidhuber, J. (1998). HQ-learning. Adaptive Behavior, 6(2), 219\u2013246.","journal-title":"Adaptive Behavior"},{"key":"10_CR89","first-page":"534","volume-title":"Machine Learning: Proceedings of the Thirteenth International Conference","author":"M. Wiering","year":"1996","unstructured":"Wiering, M., & Schmidhuber, J. (1996). Solving POMDPs with Levin search and EIRA. In Saitta, L. (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference, pp. 534\u2013542. Morgan Kaufmann Publishers, San Francisco, CA."},{"key":"10_CR90","first-page":"229","volume":"8","author":"R. J. Williams","year":"1992","unstructured":"Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229\u2013256.","journal-title":"Machine Learning"},{"key":"10_CR91","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/evco.1994.2.1.1","volume":"2","author":"S. Wilson","year":"1994","unstructured":"Wilson, S. (1994). ZCS: A zeroth level classifier system. Evolutionary Computation, 2, 1\u201318.","journal-title":"Evolutionary Computation"},{"issue":"2","key":"10_CR92","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1162\/evco.1995.3.2.149","volume":"3","author":"S. Wilson","year":"1995","unstructured":"Wilson, S. (1995). Classifier fitness based on accuracy. Evolutionary Computation, 3(2), 149\u2013175.","journal-title":"Evolutionary Computation"},{"key":"10_CR93","volume-title":"Advances in Neural Information Processing Systems","author":"D. H. Wolpert","year":"1999","unstructured":"Wolpert, D. H., Tumer, K., & Frank, J. (1999). Using collective intelligence to route internet traffic. In Kearns, M., Solla, S. A., & Cohn, D. (Eds.), Advances in Neural Information Processing Systems 12. MIT Press, Cambridge MA."}],"container-title":["Lecture Notes in Computer Science","Sequence Learning"],"original-title":[],"link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/3-540-44565-X_10","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,17]],"date-time":"2024-02-17T10:11:36Z","timestamp":1708164696000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/3-540-44565-X_10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2000]]},"ISBN":["9783540415978","9783540445654"],"references-count":93,"URL":"https:\/\/doi.org\/10.1007\/3-540-44565-x_10","relation":{},"ISSN":["0302-9743"],"issn-type":[{"type":"print","value":"0302-9743"}],"subject":[],"published":{"date-parts":[[2000]]}}}