{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T06:05:18Z","timestamp":1777701918709,"version":"3.51.4"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2017,9,4]],"date-time":"2017-09-04T00:00:00Z","timestamp":1504483200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2017,10,31]]},"abstract":"<jats:p>\n            Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of\n            <jats:italic>O<\/jats:italic>\n            (\u221a\n            <jats:italic>NT<\/jats:italic>\n            ln\n            <jats:italic>T<\/jats:italic>\n            ) on the expected regret and (2) the optimal problem-dependent bound of (1 + \u03f5)\u03a3\n            <jats:sub>i<\/jats:sub>\n            ln\n            <jats:italic>T<\/jats:italic>\n            \/\n            <jats:italic>d<\/jats:italic>\n            (\u03bc\n            <jats:sub>i<\/jats:sub>\n            ,\u03bc\n            <jats:sub>1<\/jats:sub>\n            ) +\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>N<\/jats:italic>\n            \/\u03f5\n            <jats:sup>2<\/jats:sup>\n            ) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)).\n          <\/jats:p>\n          <jats:p>\n            Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of\n            <jats:italic>O<\/jats:italic>\n            (\u221a\n            <jats:italic>NT<\/jats:italic>\n            ln\n            <jats:italic>N<\/jats:italic>\n            ) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of \u03a9 (\u221a\n            <jats:italic>NT<\/jats:italic>\n            ) for the multiarmed bandit problem.\n          <\/jats:p>","DOI":"10.1145\/3088510","type":"journal-article","created":{"date-parts":[[2017,9,5]],"date-time":"2017-09-05T12:23:34Z","timestamp":1504614214000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":68,"title":["Near-Optimal Regret Bounds for Thompson Sampling"],"prefix":"10.1145","volume":"64","author":[{"given":"Shipra","family":"Agrawal","sequence":"first","affiliation":[{"name":"Microsoft Research"}]},{"given":"Navin","family":"Goyal","sequence":"additional","affiliation":[{"name":"Microsoft Research, Karnataka, India"}]}],"member":"320","published-online":{"date-parts":[[2017,9,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Stegun","author":"Abramowitz Milton","year":"1964","unstructured":"Milton Abramowitz and Irene A . Stegun . 1964 . Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover , New York. Milton Abramowitz and Irene A. Stegun. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912)","author":"Agrawal Shipra","year":"2012","unstructured":"Shipra Agrawal and Navin Goyal . 2012 . Analysis of Thompson sampling for the multi-armed bandit problem . In Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912) . Shipra Agrawal and Navin Goyal. 2012. Analysis of Thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912)."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS\u201913)","author":"Agrawal Shipra","year":"2013","unstructured":"Shipra Agrawal and Navin Goyal . 2013 a. Further optimal regret bounds for thompson sampling . In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS\u201913) . Shipra Agrawal and Navin Goyal. 2013a. Further optimal regret bounds for thompson sampling. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS\u201913)."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)","author":"Agrawal Shipra","year":"2013","unstructured":"Shipra Agrawal and Navin Goyal . 2013 b. Thompson sampling for contextual bandits with linear payoffs . In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) . Shipra Agrawal and Navin Goyal. 2013b. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 22th Annual Conference on Learning Theory (COLT\u201909)","author":"Audibert Jean-Yves","year":"2009","unstructured":"Jean-Yves Audibert and S\u00e9bastien Bubeck . 2009 . Minimax policies for adversarial and stochastic bandits . In Proceedings of the 22th Annual Conference on Learning Theory (COLT\u201909) . Jean-Yves Audibert and S\u00e9bastien Bubeck. 2009. Minimax policies for adversarial and stochastic bandits. In Proceedings of the 22th Annual Conference on Learning Theory (COLT\u201909)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","DOI":"10.1561\/9781601986276","volume-title":"Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR","author":"Bubeck S\u00e9bastien","year":"2012","unstructured":"S\u00e9bastien Bubeck and Nicol\u00f2 Cesa-Bianchi . 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR ( 2012 ). S\u00e9bastien Bubeck and Nicol\u00f2 Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR (2012)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CISS.2014.6814158"},{"key":"e_1_2_1_9_1","unstructured":"Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems (NIPS\u201911) 24. 2249--2257.  Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems (NIPS\u201911) 24. 2249--2257."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 24th Annual Conference on Learning Theory (COLT\u201911)","author":"Garivier A.","unstructured":"A. Garivier and O. Capp\u00e9 . 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond . In Proceedings of the 24th Annual Conference on Learning Theory (COLT\u201911) . A. Garivier and O. Capp\u00e9. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Annual Conference on Learning Theory (COLT\u201911)."},{"key":"e_1_2_1_11_1","volume-title":"Multi-armed Bandit Allocation Indices","author":"Gittins John C.","unstructured":"John C. Gittins . 1989. Multi-armed Bandit Allocation Indices . John Wiley and Son . John C. Gittins. 1989. Multi-armed Bandit Allocation Indices. John Wiley and Son."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104326"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1108\/17563781011049179"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apal.2003.12.003"},{"key":"e_1_2_1_15_1","volume-title":"15th International Conference on Artificial Intelligence and Statistics (AISTATS\u201912)","author":"Kaufmann E.","unstructured":"E. Kaufmann , O. Capp , and A. Garivier . 2012a. On Bayesian upper confidence bounds for bandit problems . In 15th International Conference on Artificial Intelligence and Statistics (AISTATS\u201912) . E. Kaufmann, O. Capp, and A. Garivier. 2012a. On Bayesian upper confidence bounds for bandit problems. In 15th International Conference on Artificial Intelligence and Statistics (AISTATS\u201912)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-34106-9_18"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1911--1917","author":"Koc\u00e1k Tom\u00e1s","year":"2014","unstructured":"Tom\u00e1s Koc\u00e1k , Michal Valko , R\u00e9mi Munos , and Shipra Agrawal . 2014 . Spectral Thompson sampling . In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1911--1917 . Tom\u00e1s Koc\u00e1k, Michal Valko, R\u00e9mi Munos, and Shipra Agrawal. 2014. Spectral Thompson sampling. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1911--1917."},{"key":"e_1_2_1_18_1","unstructured":"Nathaniel Korda Emilie Kaufmann and R\u00e9mi Munos. 2013. Thompson sampling for 1-dimensional exponential family bandits. In Advances in Neural Information Processing Systems (NIPS\u201913) 26. 1448--1456.  Nathaniel Korda Emilie Kaufmann and R\u00e9mi Munos. 2013. Thompson sampling for 1-dimensional exponential family bandits. In Advances in Neural Information Processing Systems (NIPS\u201913) 26. 1448--1456."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196-8858(85)90002-8"},{"key":"e_1_2_1_20_1","volume-title":"Generalized Thompson sampling for contextual bandits. CoRR abs\/1310.7163","author":"Lihong Li.","year":"2013","unstructured":"Lihong Li. 2013. Generalized Thompson sampling for contextual bandits. CoRR abs\/1310.7163 ( 2013 ). http:\/\/arxiv.org\/abs\/1310.7163 Lihong Li. 2013. Generalized Thompson sampling for contextual bandits. CoRR abs\/1310.7163 (2013). http:\/\/arxiv.org\/abs\/1310.7163"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912)","author":"Li Lihong","year":"2012","unstructured":"Lihong Li and Olivier Chapelle . 2012 . Open problem: Regret bounds for Thompson sampling . In Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912) . Lihong Li and Olivier Chapelle. 2012. Open problem: Regret bounds for Thompson sampling. In Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912)."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 24th Annual Conference on Learning Theory (COLT\u201911)","author":"Maillard Odalric-Ambrym","year":"2011","unstructured":"Odalric-Ambrym Maillard , R\u00e9mi Munos , and Gilles Stoltz . 2011 . Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences . In Proceedings of the 24th Annual Conference on Learning Theory (COLT\u201911) . Odalric-Ambrym Maillard, R\u00e9mi Munos, and Gilles Stoltz. 2011. Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences. In Proceedings of the 24th Annual Conference on Learning Theory (COLT\u201911)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/1892211.1892223"},{"key":"e_1_2_1_26_1","volume-title":"Advances in Neural Information Processing Systems (NIPS\u201913) 26","author":"Russo Daniel","year":"2013","unstructured":"Daniel Russo , Ian Osband , and Benjamin Van Roy . 2013. (More) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems (NIPS\u201913) 26 ( 2013 ). Daniel Russo, Ian Osband, and Benjamin Van Roy. 2013. (More) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems (NIPS\u201913) 26 (2013)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1287\/moor.2014.0650"},{"key":"e_1_2_1_28_1","first-page":"1","article-title":"An information-theoretic analysis of Thompson sampling","volume":"17","author":"Russo Daniel","year":"2016","unstructured":"Daniel Russo and Benjamin Van Roy . 2016 . An information-theoretic analysis of Thompson sampling . The Journal of Machine Learning Research 17 , 1 (Jan. 2016), 2442--2471. Daniel Russo and Benjamin Van Roy. 2016. An information-theoretic analysis of Thompson sampling. The Journal of Machine Learning Research 17, 1 (Jan. 2016), 2442--2471.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1002\/asmb.874"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 17th International Conference on Machine Learning (ICML\u201900)","author":"Strens Malcolm J. A.","year":"2000","unstructured":"Malcolm J. A. Strens . 2000 . A Bayesian framework for reinforcement learning . In Proceedings of the 17th International Conference on Machine Learning (ICML\u201900) . 943--950. Malcolm J. A. Strens. 2000. A Bayesian framework for reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML\u201900). 943--950."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/25.3-4.285"}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3088510","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3088510","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:30:07Z","timestamp":1750217407000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3088510"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9,4]]},"references-count":29,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2017,10,31]]}},"alternative-id":["10.1145\/3088510"],"URL":"https:\/\/doi.org\/10.1145\/3088510","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"value":"0004-5411","type":"print"},{"value":"1557-735X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,9,4]]},"assertion":[{"value":"2015-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-09-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}