{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T01:24:23Z","timestamp":1773710663937,"version":"3.50.1"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2013,5,14]],"date-time":"2013-05-14T00:00:00Z","timestamp":1368489600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2013,6]]},"DOI":"10.1007\/s10994-013-5368-1","type":"journal-article","created":{"date-parts":[[2013,5,13]],"date-time":"2013-05-13T16:19:12Z","timestamp":1368461952000},"page":"325-349","source":"Crossref","is-referenced-by-count":52,"title":["Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model"],"prefix":"10.1007","volume":"91","author":[{"given":"Mohammad","family":"Gheshlaghi Azar","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"R\u00e9mi","family":"Munos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hilbert J.","family":"Kappen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2013,5,14]]},"reference":[{"key":"5368_CR1","unstructured":"Azar, M. G., Munos, R., Ghavamzadeh, M., & Kappen, H. J. (2011a). Reinforcement learning with a near optimal rate of convergence. Tech. rep. http:\/\/hal.inria.fr\/inria-00636615 ."},{"key":"5368_CR2","first-page":"2411","volume-title":"Advances in neural information processing systems","author":"M. G. Azar","year":"2011","unstructured":"Azar, M. G., Munos, R., Ghavamzadeh, M., & Kappen, H. J. (2011b). Speedy Q-learning. In Advances in neural information processing systems (Vol.\u00a024, pp. 2411\u20132419)."},{"key":"5368_CR3","volume-title":"ICML","author":"M. G. Azar","year":"2012","unstructured":"Azar, M. G., Munos, R., Kappen, H. J. (2012). On the sample complexity of reinforcement learning with a generative model. In ICML. Omnipress."},{"key":"5368_CR4","first-page":"35","volume-title":"Proceedings of the 25th conference on uncertainty in artificial intelligence","author":"P. L. Bartlett","year":"2009","unstructured":"Bartlett, P. L., & Tewari, A. (2009). REGAL: a\u00a0regularization based algorithm for reinforcement learning in weakly communicating MDPs. In Proceedings of the 25th conference on uncertainty in artificial intelligence (pp. 35\u201342)."},{"key":"5368_CR5","volume-title":"Dynamic programming and optimal control","author":"D. P. Bertsekas","year":"2007","unstructured":"Bertsekas, D. P. (2007). Dynamic programming and optimal control (Vol.\u00a0II, 3rd edn.). Belmount: Athena Scientific.","edition":"3"},{"key":"5368_CR6","volume-title":"Neuro-dynamic programming","author":"D. P. Bertsekas","year":"1996","unstructured":"Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont: Athena Scientific."},{"key":"5368_CR7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511546921","volume-title":"Prediction, learning, and games","author":"N. Cesa-Bianchi","year":"2006","unstructured":"Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press."},{"key":"5368_CR8","first-page":"1079","volume":"7","author":"E. Even-Dar","year":"2006","unstructured":"Even-Dar, E., Mannor, S., & Mansour, Y. (2006). Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7, 1079\u20131105.","journal-title":"Journal of Machine Learning Research"},{"key":"5368_CR9","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1016\/0020-0190(90)90214-I","volume":"33","author":"L. Hagerup","year":"1990","unstructured":"Hagerup, L., & R\u00fcb, C. (1990). A guided tour of Chernoff bounds. Information Processing Letters, 33, 305\u2013308.","journal-title":"Information Processing Letters"},{"key":"5368_CR10","first-page":"1563","volume":"11","author":"T. Jaksch","year":"2010","unstructured":"Jaksch, T., Ortner, R., & Auer, P. (2010). Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11, 1563\u20131600.","journal-title":"Journal of Machine Learning Research"},{"key":"5368_CR11","unstructured":"Kakade, S. M. (2004). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby Computational Neuroscience Unit."},{"key":"5368_CR12","first-page":"996","volume-title":"Advances in neural information processing systems","author":"M. Kearns","year":"1999","unstructured":"Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in neural information processing systems (Vol.\u00a012, pp. 996\u20131002). Cambridge: MIT Press."},{"key":"5368_CR13","doi-asserted-by":"crossref","unstructured":"Lattimore, T., & Hutter, M. (2012a). PAC bounds for discounted MDPs. CoRR arXiv:1202.3890 .","DOI":"10.1007\/978-3-642-34106-9_26"},{"key":"5368_CR14","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1007\/978-3-642-34106-9_26","volume-title":"Algorithmic learning theory","author":"T. Lattimore","year":"2012","unstructured":"Lattimore, T., & Hutter, M. (2012b). PAC bounds for discounted MDPs. In Algorithmic learning theory (pp. 320\u2013334). Berlin: Springer."},{"key":"5368_CR15","first-page":"623","volume":"5","author":"S. Mannor","year":"2004","unstructured":"Mannor, S., & Tsitsiklis, J. N. (2004). The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5, 623\u2013648.","journal-title":"Journal of Machine Learning Research"},{"key":"5368_CR16","volume-title":"Proceedings of the 38th IEEE conference on decision and control","author":"R. Munos","year":"1999","unstructured":"Munos, R., & Moore, A. (1999). Influence and variance of a Markov chain: application to adaptive discretizations in optimal control. In Proceedings of the 38th IEEE conference on decision and control."},{"key":"5368_CR17","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316887","volume-title":"Markov decision processes, discrete stochastic dynamic programming","author":"M. L. Puterman","year":"1994","unstructured":"Puterman, M. L. (1994). Markov decision processes, discrete stochastic dynamic programming. New York: Wiley."},{"issue":"3","key":"5368_CR18","first-page":"227","volume":"16","author":"S. P. Singh","year":"1994","unstructured":"Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3), 227\u2013233.","journal-title":"Machine Learning"},{"key":"5368_CR19","doi-asserted-by":"crossref","first-page":"794","DOI":"10.2307\/3213832","volume":"19","author":"M. J. Sobel","year":"1982","unstructured":"Sobel, M. J. (1982). The variance of discounted Markov decision processes. Journal of Applied Probability, 19, 794\u2013802.","journal-title":"Journal of Applied Probability"},{"key":"5368_CR20","first-page":"2413","volume":"10","author":"A. L. Strehl","year":"2009","unstructured":"Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10, 2413\u20132444.","journal-title":"Journal of Machine Learning Research"},{"key":"5368_CR21","volume-title":"Reinforcement learning: an introduction","author":"R. S. Sutton","year":"1998","unstructured":"Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press."},{"key":"5368_CR22","series-title":"Synthesis lectures on artificial intelligence and machine learning","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-01551-9","volume-title":"Algorithms for reinforcement learning","author":"C. Szepesv\u00e1ri","year":"2010","unstructured":"Szepesv\u00e1ri, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool."},{"key":"5368_CR23","first-page":"1031","volume-title":"Proceedings of the 27th international conference on machine learning","author":"I. Szita","year":"2010","unstructured":"Szita, I., & Szepesv\u00e1ri, C. (2010). Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the 27th international conference on machine learning (pp. 1031\u20131038). Omnipress."},{"key":"5368_CR24","unstructured":"Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., & Weinberger, M. J. (2003). Inequalities for the L1 deviation of the empirical distribution. Tech. rep."},{"key":"5368_CR25","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-642-27645-3","volume-title":"Reinforcement learning: State-of-the-Art","author":"M. Wiering","year":"2012","unstructured":"Wiering, M., & van Otterlo, M. (2012). Reinforcement learning: State-of-the-Art (pp. 3\u201339). Berlin: Springer. Chap.\u00a01."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-013-5368-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-013-5368-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-013-5368-1","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,1]],"date-time":"2023-07-01T18:16:06Z","timestamp":1688235366000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-013-5368-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,5,14]]},"references-count":25,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,6]]}},"alternative-id":["5368"],"URL":"https:\/\/doi.org\/10.1007\/s10994-013-5368-1","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,5,14]]}}}