{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T15:19:59Z","timestamp":1772205599778,"version":"3.50.1"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2008,7,4]],"date-time":"2008-07-04T00:00:00Z","timestamp":1215129600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2009,2]]},"DOI":"10.1007\/s10458-008-9056-7","type":"journal-article","created":{"date-parts":[[2008,7,3]],"date-time":"2008-07-03T01:48:13Z","timestamp":1215049693000},"page":"83-105","source":"Crossref","is-referenced-by-count":46,"title":["Learning and planning in environments with delayed feedback"],"prefix":"10.1007","volume":"18","author":[{"given":"Thomas J.","family":"Walsh","sequence":"first","affiliation":[]},{"given":"Ali","family":"Nouri","sequence":"additional","affiliation":[]},{"given":"Lihong","family":"Li","sequence":"additional","affiliation":[]},{"given":"Michael L.","family":"Littman","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,7,4]]},"reference":[{"key":"9056_CR1","doi-asserted-by":"crossref","unstructured":"Altman, E., & Nain, P. Closed-loop control with delayed information. In Proceedings of the ACM SIGMETRICS and Performance 1\u20135, pp. 193\u2013204.","DOI":"10.1145\/133057.133106"},{"issue":"1\u20135","key":"9056_CR2","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1023\/A:1006511328852","volume":"11","author":"C.G. Atkeson","year":"1997","unstructured":"Atkeson C.G., Moore A.W., Schaal S. (1997) Locally weighted learning for control. Artificial Intelligence Review 11(1\u20135): 75\u2013113","journal-title":"Artificial Intelligence Review"},{"key":"9056_CR3","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1057\/palgrave.jors.2600745","volume":"50","author":"J.L. Bander","year":"1999","unstructured":"Bander J.L., White C.C. III (1999) Markov decision processes with noise-corrupted and delayed state observations. Journal of the Operational Research Society 50: 660\u2013668","journal-title":"Journal of the Operational Research Society"},{"key":"9056_CR4","unstructured":"Bertsekas, D. P. (2001). Dynamic programming and optimal control (2nd ed., Vol. 1\/2). Athena Scientific."},{"key":"9056_CR5","unstructured":"Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems: Proceedings of the 1994 conference (pp. 369\u2013376). Cambridge, MA: MIT Press."},{"key":"9056_CR6","first-page":"213","volume":"3","author":"R.I. Brafman","year":"2002","unstructured":"Brafman R.I., Tennenholtz M. (2002) R-max\u2014A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3: 213\u2013231","journal-title":"Journal of Machine Learning Research"},{"issue":"4","key":"9056_CR7","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1287\/opre.20.4.904","volume":"20","author":"D.M. Brooks","year":"1972","unstructured":"Brooks D.M., Leondes C.T. (1972) Markov decision processes with state-information lag. Operations Research 20(4): 904\u2013907","journal-title":"Operations Research"},{"key":"9056_CR8","unstructured":"Fox, R., & Tennenholtz, M. (2007). A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs. In Proceedings of the 22nd Conference on Artificial Intelligence, pp. 553\u2013558."},{"issue":"301","key":"9056_CR9","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1080\/01621459.1963.10500830","volume":"58","author":"W. Hoeffding","year":"1963","unstructured":"Hoeffding W. (1963) Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301): 13\u201330","journal-title":"Journal of the American Statistical Association"},{"key":"9056_CR10","unstructured":"Jong, N. K., & Stone, P. (2006). Kernel-based models for reinforcement learning. In Proceedings of the 2006 ICML Kernel Machines and Reinforcement Learning Workshop."},{"issue":"1\u20132","key":"9056_CR11","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","volume":"101","author":"L.P. Kaelbling","year":"1998","unstructured":"Kaelbling L.P., Littman M.L., Cassandra A.R. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1\u20132): 99\u2013134","journal-title":"Artificial Intelligence"},{"key":"9056_CR12","unstructured":"Kakade, S. (2003). On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, UK."},{"key":"9056_CR13","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1109\/TAC.2003.809799","volume":"48","author":"K.V. Katsikopoulos","year":"2003","unstructured":"Katsikopoulos K.V., Engelbrecht S.E. (2003) Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control 48: 568\u2013574","journal-title":"IEEE Transactions on Automatic Control"},{"key":"9056_CR14","unstructured":"Lin, L.-J. (1993). Reinforcement Learning for Robots using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA."},{"key":"9056_CR15","unstructured":"Littman, M. L. (1996). Algorithms for sequential decision making. PhD thesis, Brown University, Providence, RI, 1996."},{"key":"9056_CR16","unstructured":"Loch, J., & Singh, S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the 15th International Conference on Machine Learning, pp. 323\u2013331."},{"key":"9056_CR17","unstructured":"Munos, R., & Moore, A. W. (2000). Rates of convergence for variable resolution schemes in optimal control. In Proceedings of the 17th International Conference on Machine Learning, pp. 647\u2013654."},{"key":"9056_CR18","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1023\/A:1017928328829","volume":"49","author":"D. Ormoneit","year":"2002","unstructured":"Ormoneit D., Sen \u015a. (2002) Kernel-based reinforcement learning. Machine Learning 49: 161\u2013178","journal-title":"Machine Learning"},{"issue":"3","key":"9056_CR19","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1287\/moor.12.3.441","volume":"12","author":"C.H. Papadimitriou","year":"1987","unstructured":"Papadimitriou C.H., Tsitsiklis J.N. (1987) The complexity of Markov decision processes. Mathematics of Operations Research 12(3): 441\u2013450","journal-title":"Mathematics of Operations Research"},{"key":"9056_CR20","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316887","volume-title":"Markov decision processes: Discrete stochastic dynamic programming","author":"M.L. Puterman","year":"1994","unstructured":"Puterman M.L. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York"},{"issue":"1\u20133","key":"9056_CR21","first-page":"123","volume":"22","author":"S.P. Singh","year":"1996","unstructured":"Singh S.P., Sutton R.S. (1996) Reinforcement learning with replacing eligibility traces. Machine Learning 22(1\u20133): 123\u2013158","journal-title":"Machine Learning"},{"issue":"3","key":"9056_CR22","first-page":"227","volume":"16","author":"S.P. Singh","year":"1994","unstructured":"Singh S.P., Yee R.C. (1994) An upper bound on the loss from approximate optimal-value functions. Machine Learning 16(3): 227\u2013233","journal-title":"Machine Learning"},{"key":"9056_CR23","doi-asserted-by":"crossref","unstructured":"Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). PAC model-free reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, pp. 881\u2013888.","DOI":"10.1145\/1143844.1143955"},{"key":"9056_CR24","first-page":"1038","volume-title":"Advances in neural information processing systems 8","author":"R.S. Sutton","year":"1996","unstructured":"Sutton R.S. (1996) Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky D.S., Mozer M.C., HasselmoM. E. (Eds) Advances in neural information processing systems 8. MIT Press, Cambridge, MA, pp 1038\u20131045"},{"key":"9056_CR25","volume-title":"Reinforcement learning: An introduction","author":"R.S. Sutton","year":"1998","unstructured":"Sutton R.S., Barto A.G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA"},{"key":"9056_CR26","unstructured":"Vijayakumar, S., & Schaal, S. (2000). Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space. In Proceedings of the 17th International Conference on Machine Learning, pp. 1079\u20131086."},{"key":"9056_CR27","doi-asserted-by":"crossref","unstructured":"Zubek, V. B., & Dietterich, T. G. (2000). A POMDP approximation algorithm that anticipates the need to observe. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, pp. 521\u2013532.","DOI":"10.1007\/3-540-44533-1_53"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-008-9056-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10458-008-9056-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-008-9056-7","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,5,29]],"date-time":"2019-05-29T17:28:24Z","timestamp":1559150904000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10458-008-9056-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,4]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,2]]}},"alternative-id":["9056"],"URL":"https:\/\/doi.org\/10.1007\/s10458-008-9056-7","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,7,4]]}}}