{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T01:58:02Z","timestamp":1778810282790,"version":"3.51.4"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"3-4","license":[{"start":{"date-parts":[[1992,5,1]],"date-time":"1992-05-01T00:00:00Z","timestamp":704678400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[1992,5]]},"DOI":"10.1007\/bf00992699","type":"journal-article","created":{"date-parts":[[2005,1,9]],"date-time":"2005-01-09T16:35:16Z","timestamp":1105288516000},"page":"293-321","source":"Crossref","is-referenced-by-count":751,"title":["Self-improving reactive agents based on reinforcement learning, planning and teaching"],"prefix":"10.1007","volume":"8","author":[{"given":"Long-Ji","family":"Lin","sequence":"first","affiliation":[]}],"member":"297","reference":[{"key":"CR1","doi-asserted-by":"crossref","unstructured":"Anderson, C.W. (1987). Strategy learning with multilayer connectionist representations.Proceedings of the Fourth International Workshop on Machine Learning (pp. 103?114).","DOI":"10.1016\/B978-0-934613-41-5.50014-3"},{"key":"CR2","unstructured":"Barto, A.G., Sutton, R.S., & Watkins, C.J.C.H. (1990). Learning and sequential decision making. In: M. Gabriel & J.W. Moore (Eds.),Learning and computational neuroscience. MIT Press."},{"key":"CR3","unstructured":"Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991).Real-time learning and control using asynchronous dynamic programming. (Technical Report 91?57). University of Massachusetts, Computer Science Department."},{"key":"CR4","unstructured":"Chapman, D. & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons.Proceedings of IJCAI-91."},{"key":"CR5","first-page":"341","volume":"8","author":"P. Dayan","year":"1992","unstructured":"Dayan, P. (1992). The convergence of TD(?) for general ?.Machine Learning, 8, 341?362.","journal-title":"Machine Learning"},{"key":"CR6","first-page":"355","volume":"5","author":"J.J. Grefenstette","year":"1990","unstructured":"Grefenstette, J.J., Ramsey, C.L., & Schultz, A.C. (1990). Learning sequential decision rules using simulation models and competition.Machine Learning, 5, 355?382.","journal-title":"Machine Learning"},{"key":"CR7","unstructured":"Hinton, G.E., McClelland, J.L., & Rumelhart, D.E. (1986). Distributed representations.Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1, Bradford Books\/MIT Press."},{"key":"CR8","volume-title":"Dynamic programming and Markov processes","author":"R.A. Howard","year":"1960","unstructured":"Howard, R.A. (1960).Dynamic programming and Markov processes. Wiley, New York."},{"key":"CR9","unstructured":"Kaelbling, L.P. (1990).Learning in embedded systems. Ph.D. Thesis, Department of Computer Science, Stanford University."},{"key":"CR10","unstructured":"Lang, K.J. (1989).A time-delay neural network architecture for speech recognition. Ph.D. Thesis, School of Computer Science, Carnegie Mellon University."},{"key":"CR11","doi-asserted-by":"crossref","unstructured":"Lin, Long-Ji. (1991a). Self-improving reactive agents: Case studies of reinforcement learning frameworks.Proceedings of the First International Conference on Simulation of Adaptive Behavior: From Animals to Animats (pp. 297?305). Also Technical Report CMU-CS-90-109, Carnegie Mellon University.","DOI":"10.7551\/mitpress\/3115.003.0041"},{"key":"CR12","doi-asserted-by":"crossref","unstructured":"Lin, Long-Ji. (1991b). Self-improvement based on reinforcement learning, planning and teaching.Proceedings of the Eighth International Workshop on Machine Learning (pp. 323?327).","DOI":"10.1016\/B978-1-55860-200-7.50067-2"},{"key":"CR13","unstructured":"Lin, Long-Ji. (1991c). Programming robots using reinforcement learning and teaching.Proceedings of AAAI-91 (pp. 781?786)."},{"key":"CR14","doi-asserted-by":"crossref","unstructured":"Mahadevan, S. & Connell, J. (1991). Scaling reinforcement learning to robotics by exploiting the subsumption architecture.Proceedings of the Eighth International Workshop on Machine Learning (pp. 328?332).","DOI":"10.1016\/B978-1-55860-200-7.50068-4"},{"key":"CR15","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/0004-3702(82)90040-6","volume":"18","author":"T.M. Mitchell","year":"1982","unstructured":"Mitchell, T.M. (1982). Generalization as search.Articial Intelligence, 18, 203?226.","journal-title":"Articial Intelligence"},{"key":"CR16","doi-asserted-by":"crossref","unstructured":"Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces.Proceedings of the Eighth International Workshop on Machine Learning (pp. 333?337).","DOI":"10.1016\/B978-1-55860-200-7.50069-6"},{"key":"CR17","unstructured":"Mozer, M.C. (1986).RAMBOT: A connectionist expert system that learns by example. (Institute for Cognitive Science Report 8610). University of California at San Diego."},{"key":"CR18","unstructured":"Pomerleau, D.A. (1989).ALVINN: An autonomous land vehicle in a neural network (Technical Report CMU-CS-89-107). Carnegie Mellon University."},{"key":"CR19","unstructured":"Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation.Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1. Bradford Books\/MIT Press."},{"key":"CR20","unstructured":"Sutton, R.S. (1984).Temporal credit assignment in reinforcement learning. Ph.D. Thesis, Dept. of Computer and Information Science, University of Massachusetts."},{"key":"CR21","first-page":"9","volume":"3","author":"R.S. Sutton","year":"1988","unstructured":"Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3, 9?44.","journal-title":"Machine Learning"},{"key":"CR22","doi-asserted-by":"crossref","unstructured":"Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming.Proceedings of the Seventh International Workshop on Machine Learning (pp. 216?224).","DOI":"10.1016\/B978-1-55860-141-3.50030-4"},{"key":"CR23","doi-asserted-by":"crossref","unstructured":"Tan, Ming. (1991). Learning a cost-sensitive internal representation for reinforcement learning.Proceedings of the Eighth International Workshop on Machine Learning (pp. 358?362).","DOI":"10.1016\/B978-1-55860-200-7.50074-X"},{"key":"CR24","unstructured":"Thrun, S.B., M\u00f6ller, K., & Linden, A. (1991). Planning with an adaptive world model. In D.S. Touretzky (Ed.),Advances in neural information processing systems 3, Morgan Kaufmann."},{"key":"CR25","unstructured":"Thrun, S.B. & M\u00f6ller, K. (1992). Active exploration in dynamic environments. To appear in D.S. Touretzky (Ed.),Advances in neural information processing systems 4, Morgan Kaufmann."},{"key":"CR26","volume-title":"Learning from delayed rewards","author":"C.J.C.H. Watkins","year":"1989","unstructured":"Watkins, C.J.C.H. (1989).Learning from delayed rewards. Ph.D. Thesis, King's College, Cambridge."},{"key":"CR27","unstructured":"Williams, R.J. & Zipser, D. (1988).A learning algorithm for continually running fully recurrent neural networks (Institute for Cognitive Science Report 8805). University of California at San Diego."},{"key":"CR28","doi-asserted-by":"crossref","unstructured":"Whitehead, S.D. & Ballard, D.H. (1989). A role for anticipation in reactive systems that learn.Proceedings of the Sixth International Workshop on Machine Learning (pp. 354?357).","DOI":"10.1016\/B978-1-55860-036-2.50090-4"},{"key":"CR29","first-page":"45","volume":"7","author":"S.D. Whitehead","year":"1991","unstructured":"Whitehead, S.D. & Ballard, D.H. (1991a). Learning to perceive and act by trial and error.Machine Learning, 7 45?83.","journal-title":"Machine Learning"},{"key":"CR30","doi-asserted-by":"crossref","unstructured":"Whitehead, S.D. (1991b). Complexity and cooperation in Q-learning.Proceedings of the Eighth International Workshop on Machine Learning (pp. 363?367).","DOI":"10.1016\/B978-1-55860-200-7.50075-1"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00992699.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/BF00992699\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00992699","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,19]],"date-time":"2024-01-19T16:44:32Z","timestamp":1705682672000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/BF00992699"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1992,5]]},"references-count":30,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[1992,5]]}},"alternative-id":["BF00992699"],"URL":"https:\/\/doi.org\/10.1007\/bf00992699","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[1992,5]]}}}