{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T12:13:22Z","timestamp":1769084002000,"version":"3.49.0"},"publisher-location":"Berlin, Heidelberg","reference-count":17,"publisher":"Springer Berlin Heidelberg","isbn-type":[{"value":"9783540434726","type":"print"},{"value":"9783540460145","type":"electronic"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2002]]},"DOI":"10.1007\/3-540-46014-4_23","type":"book-chapter","created":{"date-parts":[[2007,5,30]],"date-time":"2007-05-30T22:48:17Z","timestamp":1180565297000},"page":"249-260","source":"Crossref","is-referenced-by-count":32,"title":["Least-Squares Methods in Reinforcement Learning for Control"],"prefix":"10.1007","author":[{"given":"Michail G.","family":"Lagoudakis","sequence":"first","affiliation":[]},{"given":"Ronald","family":"Parr","sequence":"additional","affiliation":[]},{"given":"Michael L.","family":"Littman","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2002,3,19]]},"reference":[{"key":"23_CR1","volume-title":"Neuro-Dynamic Programming","author":"D. Bertsekas","year":"1996","unstructured":"D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts, 1996."},{"issue":"1","key":"23_CR2","first-page":"33","volume":"22","author":"S. J. Bradtke","year":"1996","unstructured":"Steven J. Bradtke and Andrew G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1\/2\/3):33\u201357, 1996.","journal-title":"Machine Learning"},{"key":"23_CR3","unstructured":"Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent planning with factored MDPs. In Proceeding of the 14th Neural Information Processing Systems (NIPS-14), Vancouver, Canada, December 2001."},{"key":"23_CR4","unstructured":"Carlos Guestrin, Michail G. Lagoudakis, and Ronald Parr. Coordinated reinforcement learning. In Proceedings of the 2002 AAAI Spring Symposium Series: Collaborative Learning Agents, Stanford, CA, March 2002."},{"key":"23_CR5","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","volume":"4","author":"L. P. Kaelbling","year":"1996","unstructured":"Leslie P. Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237\u2013285, 1996.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"23_CR6","unstructured":"Daphne Koller and Ronald Parr. Policy iteration for factored MDPs. In Craig Boutilier and Mois\u00e9s Goldszmidt, editors, Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI-00), pages 326\u2013334, San Francisco, CA, 2000. Morgan Kaufmann Publishers."},{"key":"23_CR7","unstructured":"Michail Lagoudakis and Ronald Parr. Model free least squares policy iteration. In Proceedings of the 14th Neural Information Processing Systems (NIPS-14), Vancouver, Canada, December 2001."},{"key":"23_CR8","unstructured":"Michail G. Lagoudakis and Michael L. Littman. Algorithm selection using reinforcement learning. In Pat Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning, pages 511\u2013518. Morgan Kaufmann, San Francisco, CA, 2000."},{"key":"23_CR9","doi-asserted-by":"crossref","unstructured":"Michail G. Lagoudakis and Michael L. Littman. Learning to select branching rules in the dpll procedure for satisfiability. In Henry Kautz and Bart Selman, editors, Electronic Notes in Discrete Mathematics (ENDM), Vol. 9, LICS 2001 Workshop on Theory and Applications of Satisfiability Testing. Elsevier Science, 2001.","DOI":"10.1016\/S1571-0653(04)00332-4"},{"key":"23_CR10","unstructured":"Michail G. Lagoudakis, Michael L. Littman, and Ronald Parr. Selecting the right algorithm. In Carla Gomes and Toby Walsh, editors, Proceedings of the 2001 AAAI Fall Symposium Series: Using Uncertainty within Computation, Cape Cod, MA, November 2001."},{"key":"23_CR11","doi-asserted-by":"crossref","unstructured":"Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157\u2013163, San Francisco, CA, 1994. Morgan Kaufmann.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"23_CR12","unstructured":"J. Randl\u00f8v and P. Alstr\u00f8m. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of The Fifteenth International Conference on Machine Learning, Madison, Wisconsin, July 1998. Morgan Kaufmann."},{"key":"23_CR13","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/S0065-2458(08)60520-3","volume":"15","author":"J. R. Rice","year":"1976","unstructured":"John R. Rice. The algorithm selection problem. Advances in Computers, 15:65\u2013118, 1976.","journal-title":"Advances in Computers"},{"key":"23_CR14","unstructured":"J. Schneider, W. Wong, A. Moore, and M. Riedmiller. Distributed value functions. In Proceedings of The Sixteenth International Conference on Machine Learning, Bled, Slovenia, July 1999. Morgan Kaufmann."},{"key":"23_CR15","volume-title":"Reinforcement Learning: An Introduction","author":"R. Sutton","year":"1998","unstructured":"R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MITPress, Cambridge, MA, 1998."},{"issue":"1","key":"23_CR16","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1109\/91.481841","volume":"4","author":"K. Wang","year":"1996","unstructured":"K. Wang, H. Tanaka and M. Griffin. An approach to fuzzy control of nonlinear systems: Stability and design issues. IEEE Transactions on Fuzzy Systems, 4(1):14\u201323, 1996.","journal-title":"IEEE Transactions on Fuzzy Systems"},{"key":"23_CR17","unstructured":"Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King\u2019s College, Cambridge, UK, 1989."}],"container-title":["Lecture Notes in Computer Science","Methods and Applications of Artificial Intelligence"],"original-title":[],"link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/3-540-46014-4_23","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,2,16]],"date-time":"2019-02-16T20:14:33Z","timestamp":1550348073000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/3-540-46014-4_23"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2002]]},"ISBN":["9783540434726","9783540460145"],"references-count":17,"URL":"https:\/\/doi.org\/10.1007\/3-540-46014-4_23","relation":{},"ISSN":["0302-9743"],"issn-type":[{"value":"0302-9743","type":"print"}],"subject":[],"published":{"date-parts":[[2002]]}}}