{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T19:44:37Z","timestamp":1771703077573,"version":"3.50.1"},"reference-count":14,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[1994,9,1]],"date-time":"1994-09-01T00:00:00Z","timestamp":778377600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[1994,9]]},"DOI":"10.1007\/bf00993308","type":"journal-article","created":{"date-parts":[[2005,1,14]],"date-time":"2005-01-14T17:53:30Z","timestamp":1105725210000},"page":"227-233","source":"Crossref","is-referenced-by-count":27,"title":["An upper bound on the loss from approximate optimal-value functions"],"prefix":"10.1007","volume":"16","author":[{"given":"Satinder P.","family":"Singh","sequence":"first","affiliation":[]},{"given":"Richard C.","family":"Yee","sequence":"additional","affiliation":[]}],"member":"297","reference":[{"key":"CR1","volume-title":"Learning and Problem Solving with Multilayer Connectionist Systems","author":"C.W. Anderson","year":"1986","unstructured":"Anderson, C.W. (1986).Learning and Problem Solving with Multilayer Connectionist Systems. PhD thesis, University of Massachusetts, Department of Computer and Information Science, University of Massachusetts, Amherst, MA 01003."},{"key":"CR2","unstructured":"Barto, A.G., Bradtke, S.J., and Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming. Technical Report TR-91-57, Department of Computer Science, University of Massachusetts."},{"issue":"5","key":"CR3","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","volume":"13","author":"A.G. Barto","year":"1983","unstructured":"Barto, A.G., Sutton, R.S., and Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics, 13(5), 834?846.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"CR4","volume-title":"Learning and Computational Neuroscience: Foundations of Adaptive Networks, chapter 13","author":"A.G. Barto","year":"1990","unstructured":"Barto, A.G., Sutton, R.S., and Watkins, C.J.C.H. (1990). Learning and sequential decision making. In M. Gabriel and J. Moore (Eds.),Learning and Computational Neuroscience: Foundations of Adaptive Networks, chapter 13. Cambridge, MA: Bradford Books\/MIT Press."},{"key":"CR5","volume-title":"Dynamic programming: Deterministic and stochastic models","author":"D.P. Bertsekas","year":"1987","unstructured":"Bertsekas, D.P. (1987).Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice Hall."},{"key":"CR6","volume-title":"Advances in Neural Information Processing Systems 5","author":"S.J. Bradtke","year":"1993","unstructured":"Bradtke, S.J. (1993). Reinforcement learning applied to linear quadratic regulation. In S.J. Hanson, J.D. Cowan, and C.L. Giles (Eds.),Advances in Neural Information Processing Systems 5, San Mateo, CA, IEEE, Morgan Kaufmann."},{"key":"CR7","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1287\/mnsc.18.1.7","volume":"19","author":"E. Porteus","year":"1971","unstructured":"Porteus, E. (1971). Some bounds for discounted sequential decision processes.Management Science, 19, 7?11.","journal-title":"Management Science"},{"key":"CR8","first-page":"9","volume":"3","author":"R.S. Sutton","year":"1988","unstructured":"Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3, 9?44.","journal-title":"Machine Learning"},{"key":"CR9","first-page":"216","volume-title":"Machine Learning: Proceedings of the Seventh International Conference (ML90)","author":"R.S. Sutton","year":"1990","unstructured":"Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In B.W. Porter and R.H. Mooney (Eds.),Machine Learning: Proceedings of the Seventh International Conference (ML90), pages 216?224. San Mateo, CA: Morgan Kaufmann."},{"issue":"3","key":"CR10","first-page":"257","volume":"8","author":"G. Tesauro","year":"1992","unstructured":"Tesauro, G. (1992). Practical issues in temporal difference learning.Machine Learning, 8(3\/4), 257?277.","journal-title":"Machine Learning"},{"issue":"3","key":"CR11","first-page":"279","volume":"8","author":"C.J.C.H. Watkins","year":"1992","unstructured":"Watkins, C.J.C.H. and Dayan, P. (1992). Q-learning.Machine Learning, 8(3\/4), 279?292.","journal-title":"Machine Learning"},{"key":"CR12","volume-title":"Learning from Delayed Rewards","author":"C.J.C.H. Watkins","year":"1989","unstructured":"Watkins, C.J.C.H. (1989).Learning from Delayed Rewards. PhD thesis, King's College, University of Cambridge, Cambridge, England."},{"issue":"1","key":"CR13","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/TSMC.1987.289329","volume":"17","author":"P.J. Werbos","year":"1987","unstructured":"Werbos, P.J. (1987). Building and understanding adaptive systems: A statistical\/numerical approach to factory automation and brain research.IEEE Transactions on Systems, Man, and Cybernetics, 17(1), 7?20.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"CR14","series-title":"Technical Report","volume-title":"Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems","author":"R.J. Williams","year":"1993","unstructured":"Williams, R.J. and Baird, L.C. (1993). Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems. Technical Report NU-CCS-93-11, Northeastern University, College of Computer Science, Boston, MA 02115."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00993308.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/BF00993308\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00993308","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,29]],"date-time":"2019-04-29T22:58:43Z","timestamp":1556578723000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/BF00993308"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1994,9]]},"references-count":14,"journal-issue":{"issue":"3","published-print":{"date-parts":[[1994,9]]}},"alternative-id":["BF00993308"],"URL":"https:\/\/doi.org\/10.1007\/bf00993308","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[1994,9]]}}}