{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T15:25:44Z","timestamp":1764602744626},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[1996,10,1]],"date-time":"1996-10-01T00:00:00Z","timestamp":844128000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[1996,10]]},"DOI":"10.1007\/bf00115298","type":"journal-article","created":{"date-parts":[[2004,11,1]],"date-time":"2004-11-01T02:11:46Z","timestamp":1099275106000},"page":"5-22","source":"Crossref","is-referenced-by-count":51,"title":["Exploration bonuses and dual control"],"prefix":"10.1007","volume":"25","author":[{"given":"Peter","family":"Dayan","sequence":"first","affiliation":[]},{"given":"Terrence J.","family":"Sejnowski","sequence":"additional","affiliation":[]}],"member":"297","reference":[{"key":"CR1","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/0004-3702(94)00011-O","volume":"72","author":"A.G. Barto","year":"1995","unstructured":"BartoA.G., BradtkeS.J. & SinghS.P. (1995). Learning to act using real-time dynamic programming.Artificial Intelligence,72, 81?138.","journal-title":"Artificial Intelligence"},{"key":"CR2","volume-title":"Learning and Computational Neuroscience: Foundations of Adaptive Networks","author":"A.G. Barto","year":"1989","unstructured":"BartoA.G., SuttonR.S. & WatkinsC.J.C.H. (1989). Learning and sequential decision making. In MGabriel & JMoore, editors,Learning and Computational Neuroscience: Foundations of Adaptive Networks. Cambridge, MA: MIT Press, Bradford Books."},{"key":"CR3","volume-title":"Stochastic Optimal Control: The Discrete Time Case","author":"D. Bertsekas","year":"1978","unstructured":"BertsekasD. & ShreveS.E. (1978).Stochastic Optimal Control: The Discrete Time Case. New York, NY: Academic Press."},{"key":"CR4","first-page":"679","volume-title":"Advances in Neural Information Processing Systems, 6","author":"D.A. Cohn","year":"1994","unstructured":"CohnD.A. (1994). Neural network exploration using optimal experiment design. In JDCowan, GTesauro & JAllspector, editors,Advances in Neural Information Processing Systems, 6. San Mateo, CA: Morgan Kaufmann, 679?686."},{"key":"CR5","series-title":"Techical Report","doi-asserted-by":"crossref","DOI":"10.21236\/AD0612601","volume-title":"Markov Decision Processes with Uncertain Transition Probabilities","author":"J.M. Cozzolino","year":"1965","unstructured":"CozzolinoJ.M., Gonzalez-ZubietaR. & MillerR. (1965).Markov Decision Processes with Uncertain Transition Probabilities. Techical Report 11, Operations Research Center, MIT, Cambridge."},{"key":"CR6","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1109\/TAC.1981.1102774","volume":"26","author":"P.L. Dersin","year":"1981","unstructured":"DersinP.L., AthansM. & KendrickD.A. (1981). Some properties of the dual adaptive stochastic control algorithm.IEEE Transactions on Automatic Control,26, 1001?1008.","journal-title":"IEEE Transactions on Automatic Control"},{"key":"CR7","volume-title":"Dynamic Programming and the Calculus of Variations","author":"S.E. Dreyfus","year":"1965","unstructured":"DreyfusS.E. (1965).Dynamic Programming and the Calculus of Variations. New York, NY: Academic Press."},{"key":"CR8","volume-title":"Theory of Optimal Experiments","author":"V. Fedorov","year":"1972","unstructured":"FedorovV. (1972).Theory of Optimal Experiments. New York: Academic Press."},{"key":"CR9","volume-title":"Optimal Control Systems","author":"A.A. Fe'ldbaum","year":"1965","unstructured":"Fe'ldbaumA.A. (1965).Optimal Control Systems. New York, NY: Academic Press."},{"key":"CR10","volume-title":"Dynamic Programming and Markov Processes","author":"R.A. Howard","year":"1960","unstructured":"HowardR.A. (1960).Dynamic Programming and Markov Processes. New York, NY: Technology Press & Wiley."},{"key":"CR11","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1137\/0323023","volume":"23","author":"P.R. Kumar","year":"1985","unstructured":"KumarP.R. (1985). A survey of some results in stochastic adaptive control.SIAM Journal on Control and Optimization,23, 329?380.","journal-title":"SIAM Journal on Control and Optimization"},{"key":"CR12","unstructured":"Littman, M.L. (1996).Algorithms for Sequential Decision Making. Ph.D., Department of Computer Science, Brown University."},{"key":"CR13","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/BF02055574","volume":"28","author":"W.S. Lovejoy","year":"1991","unstructured":"LovejoyW.S. (1991). A survey of algorithmic methods for partially observed Markov decision processes.Annals of Operations Research,28, 47?66.","journal-title":"Annals of Operations Research"},{"key":"CR14","unstructured":"Meier, L., IIIrd (1965). Combined optimal control and estimation.Proceedings of the Third Annual Allerton Conference on Circuit and System Theory."},{"key":"CR15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1287\/mnsc.28.1.1","volume":"28","author":"G.E. Monahan","year":"1982","unstructured":"MonahanG.E. (1982). A survey of partially observable Markov decision processes: Theory, models and algorithms.Management Science,28, 1?16.","journal-title":"Management Science"},{"key":"CR16","first-page":"103","volume":"13","author":"A.W. Moore","year":"1993","unstructured":"MooreA.W. & AtkesonC.G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time.Machine Learning,13, 103?130.","journal-title":"Machine Learning"},{"key":"CR17","volume-title":"Advances in Neural Information Processing Systems, 6","author":"A.W. Moore","year":"1994","unstructured":"MooreA.W. & AtkesonC.G. (1994). The Parti-Game algorithm. In GTesauro, JDCowan & JAlspector, editors,Advances in Neural Information Processing Systems, 6. San Mateo, CA: Morgan Kaufmann."},{"key":"CR18","unstructured":"Peng, J. & Williams, R.J. (1992).Efficient search control in DYNA. College of Computer Science, Northeastern University."},{"key":"CR19","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1137\/0308040","volume":"8","author":"R.W. Rishel","year":"1970","unstructured":"RishelR.W. (1970). Necessary and sufficient dynamic programming conditions for continuous time stochastic optimal control.SIAM Journal of Control,8, 559?571.","journal-title":"SIAM Journal of Control"},{"key":"CR20","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1109\/TAC.1982.1102893","volume":"27","author":"M. Sato","year":"1982","unstructured":"SatoM., AbeK. & TakedaH. (1982). Learning control of finite Markov chains with unknown transition probabilities.IEEE Transactions on Automatic Control,27, 502?505.","journal-title":"IEEE Transactions on Automatic Control"},{"key":"CR21","series-title":"Technical Report FKI-149-91","volume-title":"Adaptive Confidence and Adaptive Curiosity","author":"J.H. Schmidhuber","year":"1991","unstructured":"SchmidhuberJ.H. (1991).Adaptive Confidence and Adaptive Curiosity. (Technical Report FKI-149?91). Technische Universit\u00e4t M\u00fcnchen, Germany."},{"key":"CR22","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1016\/0022-247X(65)90027-2","volume":"12","author":"C.T. Striebel","year":"1965","unstructured":"StriebelC.T. (1965). Sufficient statistics in the optimal control of stochastic systems.Journal of Mathematical Analysis and Applications,12, 576?592.","journal-title":"Journal of Mathematical Analysis and Applications"},{"key":"CR23","doi-asserted-by":"crossref","unstructured":"Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming.Machine Learning: Proceedings of the Seventh International Conference, 216?224.","DOI":"10.1016\/B978-1-55860-141-3.50030-4"},{"key":"CR24","volume-title":"Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches","author":"S.B. Thrun","year":"1992","unstructured":"ThrunS.B. (1992). The role of exploration in learning control. In D.A.White & D.A.Sofge, editors,Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. New York, NY: Van Nostrand Reinhold."},{"key":"CR25","first-page":"531","volume-title":"Advances in Neural Information Processing Systems, 4","author":"S.B. Thrun","year":"1992","unstructured":"ThrunS.B. & M\u00f6llerK. (1992). Active exploration in dynamic environments. In J.E.Moody, S.J.Hanson & R.P.Lippmann, editorsAdvances in Neural Information Processing Systems, 4, 531?538. San Mateo, CA: Morgan Kaufmann."},{"key":"CR26","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1109\/TAC.1973.1100242","volume":"18","author":"E. Tse","year":"1973","unstructured":"Tse E. & Bar-Shalom Y. (1973). An actively adaptive control for linear systems with random parameters via the dual control approach.IEEE Transactions on Automatic Control,18, 109?117.","journal-title":"IEEE Transactions on Automatic Control"},{"key":"CR27","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1109\/TAC.1973.1100238","volume":"18","author":"E. Tse","year":"1973","unstructured":"Tse E., Bar-Shalom Y. & MeierLIIIrd (1973). Wide-sense adaptive dual control for nonlinear stochastic systems.IEEE Transactions on Automatic Control,18, 98?108.","journal-title":"IEEE Transactions on Automatic Control"},{"key":"CR28","unstructured":"Watkins, C.J.C.H. (1989).Learning from Delayed Rewards. PhD Thesis, Department of Psychology, University of Cambridge, England."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00115298.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/BF00115298\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00115298","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,8]],"date-time":"2019-04-08T14:09:21Z","timestamp":1554732561000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/BF00115298"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1996,10]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[1996,10]]}},"alternative-id":["BF00115298"],"URL":"https:\/\/doi.org\/10.1007\/bf00115298","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[1996,10]]}}}