{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T17:10:04Z","timestamp":1781370604104,"version":"3.54.1"},"reference-count":122,"publisher":"Emerald","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,1,20]]},"abstract":"<jats:p>This work covers several aspects of the optimism in the face of uncertainty principle applied to large scale optimization problems under finite numerical budget. The initial motivation for the research reported here originated from the empirical success of the so-called Monte-Carlo Tree Search method popularized in Computer Go and further extended to many other games as well as optimization and planning problems. Our objective is to contribute to the development of theoretical foundations of the field by characterizing the complexity of the underlying optimization problems and designing efficient algorithms with performance guarantees.<\/jats:p>\n                  <jats:p>The main idea presented here is that it is possible to decompose a complex decision making problem (such as an optimization problem in a large search space) into a sequence of elementary decisions, where each decision of the sequence is solved using a (stochastic) multi-armed bandit (simple mathematical model for decision making in stochastic environments). This so-called hierarchical bandit approach (where the reward observed by a bandit in the hierarchy is itself the return of another bandit at a deeper level) possesses the nice feature of starting the exploration by a quasi-uniform sampling of the space and then focusing progressively on the most promising area, at different scales, according to the evaluations observed so far, until eventually performing a local search around the global optima of the function. The performance of the method is assessed in terms of the optimality of the returned solution as a function of the number of function evaluations.<\/jats:p>\n                  <jats:p>Our main contribution to the field of function optimization is a class of hierarchical optimistic algorithms designed for general search spaces (such as metric spaces, trees, graphs, Euclidean spaces) with different algorithmic instantiations depending on whether the evaluations are noisy or noiseless and whether some measure of the \u201csmoothness\u201d of the function is known or unknown. The performance of the algorithms depends on the \u201clocal\u201d behavior of the function around its global optima expressed in terms of the quantity of near-optimal states measured with some metric. If this local smoothness of the function is known then one can design very efficient optimization algorithms (with convergence rate independent of the space dimension). When this information is unknown, one can build adaptive techniques which, in some cases, perform almost as well as when it is known.<\/jats:p>\n                  <jats:p>In order to be self-contained, we start with a brief introduction to the stochastic multi-armed bandit problem in Chapter 1 and describe the UCB (Upper Confidence Bound) strategy and several extensions. In Chapter 2 we present the Monte-Carlo Tree Search method applied to Computer Go and show the limitations of previous algorithms such as UCT (UCB applied to Trees). This provides motivation for designing theoretically well-founded optimistic optimization algorithms. The main contributions on hierarchical optimistic optimization are described in Chapters 3 and 4 where the general setting of a semi-metric space is introduced and algorithms designed for optimizing a function assumed to be locally smooth (around its maxima) with respect to a semi-metric are presented and analyzed. Chapter 3 considers the case when the semi-metric is known and can be used by the algorithm, whereas Chapter 4 considers the case when it is not known and describes an adaptive technique that does almost as well as when it is known. Finally in Chapter 5 we describe optimistic strategies for a specific structured problem, namely the planning problem in Markov decision processes with infinite horizon discounted rewards.<\/jats:p>","DOI":"10.1561\/2200000038","type":"journal-article","created":{"date-parts":[[2014,1,20]],"date-time":"2014-01-20T10:19:04Z","timestamp":1390213144000},"page":"1-129","source":"Crossref","is-referenced-by-count":122,"title":["From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning"],"prefix":"10.1108","volume":"7","author":[{"given":"R\u00e9mi","family":"Munos","sequence":"first","affiliation":[{"name":"INRIA Lille \u2013 Nord Europe"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"140","published-online":{"date-parts":[[2014,1,20]]},"reference":[{"key":"2026033014112017700_ref001","article-title":"Improved algorithms for linear stochastic bandits","author":"Abbasi-Yadkori","year":"2011","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2026033014112017700_ref002","article-title":"Online-to-confidence-set conversions and application to sparse stochastic bandits","author":"Abbasi-Yadkori","year":"2012","journal-title":"In Artificial Intelligence and Statistics"},{"key":"2026033014112017700_ref003","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1109\/34.44404","article-title":"Expected-outcome: A general model of static evaluation","volume":"12","author":"Abramson","year":"1990","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2026033014112017700_ref004","article-title":"Stochastic convex optimization with bandit feedback","author":"Agarwal","year":"2011","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2026033014112017700_ref005","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1109\/TIT.2011.2182178","article-title":"Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization","volume":"58","author":"Agarwal","year":"2012","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033014112017700_ref006","doi-asserted-by":"crossref","first-page":"1926","DOI":"10.1137\/S0363012992237273","article-title":"The continuum-armed bandit problem","volume":"33","author":"Agrawal","year":"1995","journal-title":"SIAM Journal on Control and Optimization"},{"key":"2026033014112017700_ref007","doi-asserted-by":"crossref","first-page":"1054","DOI":"10.2307\/1427934","article-title":"Sample mean based index policies with O(log n) regret for the multi-armed bandit problem","volume":"27","author":"Agrawal","year":"1995","journal-title":"Advances in Applied Probability"},{"key":"2026033014112017700_ref008","article-title":"Analysis of Thompson sampling for the multi-armed bandit problem","author":"Agrawal","year":"2012","journal-title":"In Conference on Learning Theory"},{"key":"2026033014112017700_ref009","article-title":"Further optimal regret bounds for Thompson sampling","author":"Agrawal","year":"2013","journal-title":"In Sixteenth International Conference on Artificial Intelligence and Statistics"},{"key":"2026033014112017700_ref010","article-title":"Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search","author":"Asmuth","year":"2011","journal-title":"In Uncertainty in Artificial Intelligence"},{"key":"2026033014112017700_ref011","doi-asserted-by":"crossref","first-page":"1876","DOI":"10.1016\/j.tcs.2009.01.016","article-title":"Exploration-exploitation trade-off using variance estimates in multi-armed bandits","volume":"410","author":"Audibert","year":"2009","journal-title":"Theoretical Computer Science"},{"key":"2026033014112017700_ref012","article-title":"Best arm identification in multi-armed bandits","author":"Audibert","year":"2010","journal-title":"In Conference on Learning Theory"},{"key":"2026033014112017700_ref013","article-title":"Minimax policies for adversarial and stochastic bandits","author":"Audibert","year":"2009","journal-title":"In Sanjot Dasgupta and Adam Klivans, editors, Proceedings of the 22nd annual Conference on Learning Theory, COLT 2009, Montreal, Quebec, Canada, Jun 2009"},{"key":"2026033014112017700_ref014","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1007\/978-3-540-72927-3_33","article-title":"Improved rates for the stochastic continuum-armed bandit problem","author":"Auer","year":"2007","journal-title":"20th Conference on Learning Theory"},{"key":"2026033014112017700_ref015","first-page":"397","article-title":"Using confidence bounds for exploitation-exploration trade-offs","volume":"3","author":"Auer","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"2026033014112017700_ref016","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1013689704352","article-title":"Finite time analysis of the multiarmed bandit problem","volume":"47","author":"Auer","year":"2002","journal-title":"Machine Learning"},{"key":"2026033014112017700_ref017","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1137\/S0097539701398375","article-title":"The nonstochastic multiarmed bandit problem","volume":"32","author":"Auer","year":"2003","journal-title":"SIAM Journal on Computing"},{"key":"2026033014112017700_ref018","first-page":"289","article-title":"Theory of Randomized Search Heuristics: Foundations and Recent Developments","author":"Auger","year":"2011","journal-title":"World Scientific Publishing"},{"key":"2026033014112017700_ref019","doi-asserted-by":"crossref","first-page":"357","DOI":"10.2748\/tmj\/1178243286","article-title":"Weighted sums of certain dependent random variables","volume":"19","author":"Azuma","year":"1967","journal-title":"Tohoku Mathematical Journal"},{"key":"2026033014112017700_ref020","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.2307\/2951539","article-title":"Denumerable-armed bandits","volume":"60","author":"Banks","year":"1992","journal-title":"Econometrica"},{"key":"2026033014112017700_ref021","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-18324-9","article-title":"Markov Decision Processes with Applications to Finance","author":"B\u00e4uerle","year":"2011"},{"key":"2026033014112017700_ref022","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1214\/aos\/1069362389","article-title":"Bandit problems with infinitely many arms","volume":"25","author":"Berry","year":"1997","journal-title":"Annals of Statistics"},{"key":"2026033014112017700_ref023","article-title":"Neuro-Dynamic Programming","author":"Bertsekas","year":"1996","journal-title":"Athena Scientific"},{"key":"2026033014112017700_ref024","article-title":"Scalability and parallelization of monte-carlo tree search","author":"Bouzy","year":"2012","journal-title":"In International Conference on Computers and Games"},{"key":"2026033014112017700_ref025","first-page":"159","article-title":"Monte-Carlo go developments","author":"Bouzy","year":"2003","journal-title":"In Hiroyuki Iida, Ernst A. Heinz H., Jaap van den Herik, editor, Advances in Computer Games"},{"key":"2026033014112017700_ref026","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/S0004-3702(01)00127-8","article-title":"Computer Go: an AI oriented survey","volume":"132","author":"Bouzy","year":"2001","journal-title":"Artificial Intelligence"},{"key":"2026033014112017700_ref027","doi-asserted-by":"crossref","DOI":"10.1109\/TCIAIG.2012.2186810","article-title":"A survey of monte carlo tree search methods","volume":"4","author":"Browne","year":"2012","journal-title":"IEEE Transactions on Computational Intelligence and AI in Games"},{"key":"2026033014112017700_ref028","article-title":"Monte Carlo Go","author":"Br\u00fcgmann","year":"1993","journal-title":"Technical report, Syracuse University, NY, USA"},{"key":"2026033014112017700_ref029","article-title":"Open loop optimistic planning","author":"Bubeck","year":"2010","journal-title":"In Conference on Learning Theory"},{"key":"2026033014112017700_ref030","first-page":"201","article-title":"Online optimization of X-armed bandits","volume":"22","author":"Bubeck","year":"2008","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2026033014112017700_ref031","article-title":"Pure exploration in multi-armed bandits problems","author":"Bubeck","year":"2009","journal-title":"In Proceedings of the 20th International Conference on Algorithmic Learning Theory"},{"key":"2026033014112017700_ref032","first-page":"1655","article-title":"X-armed bandits","volume":"12","author":"Bubeck","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"2026033014112017700_ref033","article-title":"Lipschitz bandits without the Lipschitz constant","author":"Bubeck","year":"2011","journal-title":"In International Conference on Algorithmic Learning Theory"},{"key":"2026033014112017700_ref034","article-title":"Bandits Games and Clustering Foundations","author":"Bubeck","year":"2010","journal-title":"PhD thesis, Universit\u00e9 de Lille 1"},{"key":"2026033014112017700_ref035","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000024","article-title":"Regret analysis of stochastic and nonstochastic multi-armed bandit problems","volume":"5","author":"Bubeck","year":"2012","journal-title":"Foundations and Trends in Machine Learning"},{"key":"2026033014112017700_ref036","article-title":"Optimistic planning in Markov decision processes","author":"Busoniu","year":"2011","journal-title":"In Frank Lewis and Derong Liu, editors, Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control"},{"key":"2026033014112017700_ref037","first-page":"182","article-title":"Optimistic planning for markov decision processes","author":"Busoniu","year":"2012","journal-title":"In Proceedings 15th International Conference on Artificial Intelligence and Statistics (AISTATS-12)"},{"key":"2026033014112017700_ref038","article-title":"Reinforcement Learning and Dynamic Programming Using Function Approximators","author":"Busoniu","year":"2010","journal-title":"Taylor & Francis CRC Press"},{"key":"2026033014112017700_ref039","doi-asserted-by":"crossref","DOI":"10.1109\/ADPRL.2011.5967375","article-title":"Optimistic planning for sparsely stochastic systems","author":"Busoniu","year":"2011","journal-title":"In 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11), Paris, France, 11\u201315 April"},{"key":"2026033014112017700_ref040","article-title":"Adaptive-treed bandits","author":"Bull","year":"2013","journal-title":"Technical report, arXiv:1302.2489v2"},{"key":"2026033014112017700_ref041","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1006\/aama.1996.0007","article-title":"Optimal adaptive policies for sequential allocation problems","volume":"17","author":"Burnetas","year":"1996","journal-title":"Advances in Applied Mathematics"},{"key":"2026033014112017700_ref042","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1006\/aama.1996.0007","article-title":"Optimal adaptive policies for sequential allocation problems","volume":"17","author":"Burnetas","year":"1996","journal-title":"Advances in Applied Mathematics"},{"key":"2026033014112017700_ref043","article-title":"Model Predictive Control","author":"Camacho","year":"2004","journal-title":"Springer-Verlag"},{"key":"2026033014112017700_ref044","doi-asserted-by":"crossref","first-page":"1516","DOI":"10.1214\/13-AOS1119","article-title":"Kullback-leibler upper confidence bounds for optimal sequential allocation","volume":"41","author":"Capp\u00e9","year":"2013","journal-title":"Annals of Statistics"},{"key":"2026033014112017700_ref045","article-title":"Bandit theory meets compressed sensing for high dimensional stochastic linear bandit","author":"Carpentier","year":"2012","journal-title":"In International Conference on Artificial Intelligence and Statistics"},{"key":"2026033014112017700_ref046","article-title":"Prediction, Learning, and Games","author":"Cesa-Bianchi","year":"2006","journal-title":"Cambridge University Press, New York, NY, USA"},{"key":"2026033014112017700_ref047","article-title":"Simulation-based Algorithms for Markov Decision Processes","author":"Chang","year":"2007","journal-title":"Springer, London"},{"key":"2026033014112017700_ref048","article-title":"Monte-Carlo Tree Search","author":"Chaslot","year":"2010","journal-title":"PhD thesis, Maastricht University"},{"key":"2026033014112017700_ref049","article-title":"Bandit algorithms for tree search","author":"Coquelin","year":"2007","journal-title":"In Uncertainty in Artificial Intelligence"},{"key":"2026033014112017700_ref050","first-page":"72","article-title":"Efficient selectivity and backup operators in Monte-Carlo Tree Search","author":"Coulom","year":"2006","journal-title":"In LNCS, editor, Computer Games, volume 4630"},{"key":"2026033014112017700_ref051","first-page":"355","article-title":"Stochastic linear optimization under bandit feedback","author":"Dani","year":"2008","journal-title":"In Rocco A. Servedio and Tong Zhang, editors, Proceedings of the 21st annual Conference on Learning Theory, volume 80 of COLT 08, Helsinki, Finland, jul"},{"key":"2026033014112017700_ref052","first-page":"1","article-title":"Lazy planning under uncertainties by optimizing decisions on an ensemble of incomplete disturbance trees","author":"Defourny","year":"2008","journal-title":"In S. Girgin, M. Loth, R. Munos, P. Preux, and D. Ryabko, editors, Recent Advances in Reinforcement Learning, volume 5323 of Lecture Notes in Computer Science"},{"key":"2026033014112017700_ref053","article-title":"Optimal learning: computational procedures for Bayes-adaptive Markov decision processes","author":"Duff","year":"2002","journal-title":"PhD thesis, Department of Computer Science, University of Massachusetts, Amherst"},{"key":"2026033014112017700_ref054","first-page":"586","article-title":"Parametric bandits: The generalized linear case","author":"Filippi","year":"2010","journal-title":"In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23"},{"key":"2026033014112017700_ref055","article-title":"Convergence analysis of the direct algorithm","author":"Finkel","year":"2004","journal-title":"Technical report, North Carolina State University, Center for"},{"key":"2026033014112017700_ref056","first-page":"385","article-title":"Online convex optimization in the bandit setting: gradient descent without a gradient","author":"Flaxman","year":"2005","journal-title":"In Proceedings of the 16th annual ACM-SIAM Symposium on Discrete Algorithms, SODA 05"},{"key":"2026033014112017700_ref057","article-title":"Deterministic Global Optimization: Theory, Algorithms and Applications","author":"Floudas","year":"1999","journal-title":"Kluwer Academic Publishers, Dordrecht \/ Boston \/ London"},{"key":"2026033014112017700_ref058","article-title":"Optimistic planning for belief augmented Markov decision processes","author":"Fonteneau","year":"2013","journal-title":"In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning"},{"key":"2026033014112017700_ref059","article-title":"Modifications of the Direct algorithm","author":"Gablonsky","year":"2001","journal-title":"PhD thesis"},{"key":"2026033014112017700_ref060","article-title":"The KL-UCB algorithm for bounded stochastic bandits and beyond","author":"Garivier","year":"2011","journal-title":"In Proceedings of the 24th annual Conference on Learning Theory, COLT 11"},{"key":"2026033014112017700_ref061","article-title":"Modification of UCT with patterns in monte-carlo go","author":"Gelly","year":"2006","journal-title":"Technical report, INRIA RR-6062"},{"key":"2026033014112017700_ref062","first-page":"273","article-title":"Combining online and offline knowledge in UCT","author":"Gelly","year":"2007","journal-title":"In Zoubin Ghahramani, editor, International Conference on Machine Learning, volume 227 of ICML 07, ACM International Conference Proceedings Series, Corvallis, Oregon, USA, jun"},{"key":"2026033014112017700_ref063","doi-asserted-by":"crossref","first-page":"1856","DOI":"10.1016\/j.artint.2011.03.007","article-title":"Monte-carlo tree search and rapid action value estimation in computer Go","volume":"175","author":"Gelly","year":"2011","journal-title":"Artificial Intelligence"},{"key":"2026033014112017700_ref064","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1111\/j.2517-6161.1979.tb01068.x","article-title":"Bandit processes and dynamic allocation indices","volume":"41","author":"Gittins","year":"1979","journal-title":"In Journal of the Royal Statistical Society Series B"},{"key":"2026033014112017700_ref065","article-title":"Multi-armed Bandit Allocation Indices","author":"Gittins","year":"1989","journal-title":"Wiley"},{"key":"2026033014112017700_ref066","article-title":"Efficient Bayes-adaptive reinforcement learning using sample-based search","author":"Guez","year":"2012","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2026033014112017700_ref067","article-title":"Global Optimization Using Interval Analysis","author":"Hansen","year":"1992","journal-title":"Marcel Dekker, New York"},{"key":"2026033014112017700_ref068","article-title":"A heuristic search algorithm for Markov decision problems","author":"Hansen","year":"1999","journal-title":"In Proceedings of the Bar-Ilan Symposium on the Foundation of Artificial Intelligence, Ramat Gan, Israel"},{"key":"2026033014112017700_ref069","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1007\/s10994-011-5257-4","article-title":"An asymptotically optimal policy for finite support models in the multiarmed bandit problem","volume":"85","author":"Honda","year":"2011","journal-title":"Machine Learning"},{"key":"2026033014112017700_ref070","first-page":"67","article-title":"An asymptotically optimal bandit algorithm for bounded support models","author":"Honda","year":"2010","journal-title":"In Adam Tauman Kalai and Mehryar Mohri, editors, Proceedings of the 23rd annual Conference on Learning Theory"},{"key":"2026033014112017700_ref071","article-title":"Global Optimization: Deterministic ? Deterministic Approaches","author":"Horst","year":"1996","journal-title":"Springer, Berlin \/ Heidelberg \/ New York, 3rd edition"},{"key":"2026033014112017700_ref072","first-page":"151","article-title":"Optimistic planning of deterministic systems","author":"Hren","year":"2008","journal-title":"In European Workshop on Reinforcement Learning, Springer LNAI 5323, editor, Recent Advances in Reinforcement Learning"},{"key":"2026033014112017700_ref073","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/BF00941892","article-title":"Lipschitzian optimization without the lipschitz constant","volume":"79","author":"Jones","year":"1993","journal-title":"Journal of Optimization Theory and Applications"},{"key":"2026033014112017700_ref074","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and acting in partially observable stochastic domains","volume":"101","author":"Kaelbling","year":"1998","journal-title":"Artificial Intelligence"},{"key":"2026033014112017700_ref075","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.jcss.2004.10.016","article-title":"Efficient algorithms for online decision problems","volume":"71","author":"Kalai","year":"2005","journal-title":"Journal of Computer and System Sciences"},{"key":"2026033014112017700_ref076","article-title":"On Bayesian upper confidence bounds for bandit problems","author":"Kaufmann","year":"2012","journal-title":"In International Conference on Artificial Intelligence and Statistics"},{"key":"2026033014112017700_ref077","article-title":"Thompson sampling: An asymptotically optimal finite time analysis","author":"Kaufmann","year":"2012","journal-title":"In International Conference on Algorithmic Learning Theory"},{"key":"2026033014112017700_ref078","article-title":"Thompson sampling for 1-dimensional exponential family bandits","author":"Kaufmann","year":"2013","journal-title":"In Neural Information Processing Systems"},{"key":"2026033014112017700_ref079","article-title":"Rigorous Global Search: Continuous Problems","author":"Kearfott","year":"1996","journal-title":"Kluwer Academic Publishers, Dordrecht \/ Boston \/ London"},{"key":"2026033014112017700_ref080","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1023\/A:1017932429737","article-title":"A sparse sampling algorithm for near-optimal planning in large Markovian decision processes","volume":"49","author":"Kearns","year":"2002","journal-title":"In Machine Learning"},{"key":"2026033014112017700_ref081","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1023\/A:1017932429737","article-title":"A sparse sampling algorithm for near-optimal planning in large Markov decision processes","volume":"49","author":"Kearns","year":"2002","journal-title":"Machine Learning"},{"key":"2026033014112017700_ref082","article-title":"Multi-armed bandits in metric spaces","author":"Kleinberg","year":"2008","journal-title":"In Proceedings of the 40th ACM Symposium on Theory of Computing"},{"key":"2026033014112017700_ref083","article-title":"Nearly tight bounds for the continuum-armed bandit problem","author":"Kleinberg","year":"2004","journal-title":"In Proceedings of the 18th Conference on Advances in Neural Information Processing Systems, NIPS 04, Vancouver, British Columbia, Canada, dec"},{"key":"2026033014112017700_ref084","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1145\/1374376.1374475","article-title":"Multi-armed bandit problems in metric spaces","author":"Kleinberg","year":"2008","journal-title":"In Proceedings of the 40th ACM Symposium on Theory of Computing, STOC 08"},{"key":"2026033014112017700_ref085","first-page":"282","article-title":"Bandit based Monte-Carlo planning","author":"Kocsis","year":"2006","journal-title":"In Proceedings of the 17th European Conference on Machine Learning (ECML 2006)"},{"key":"2026033014112017700_ref086","article-title":"Following the perturbed leader to gamble at multi-armed bandits","author":"Kujala","year":"2007","journal-title":"In International Conference on Algorithmic Learning Theory"},{"key":"2026033014112017700_ref087","article-title":"Planning Algorithms","author":"LaValle","year":"2006","journal-title":"Cambridge University Press"},{"key":"2026033014112017700_ref088","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/0196-8858(85)90002-8","article-title":"Asymptotically efficient adaptive allocation rules","volume":"6","author":"Lai","year":"1985","journal-title":"Advances in Applied Mathematics"},{"key":"2026033014112017700_ref089","first-page":"73","article-title":"The computational intelligence of MoGo revealed in Taiwan computer Go tournaments","volume":"1","author":"Lee","year":"2009","journal-title":"IEEE Transactions on Computational Intelligence and AI in Games"},{"key":"2026033014112017700_ref090","article-title":"Predictive Control with Constraints","author":"Maciejowski","year":"2002","journal-title":"Prentice Hall"},{"key":"2026033014112017700_ref091","article-title":"Apprentissage s\u00e9quentiel: Bandits, Statistique et Renforcement","author":"Maillard","year":"2011","journal-title":"PhD thesis, Universit\u00e9 des Sciences et des Technologies de Lille 1"},{"key":"2026033014112017700_ref092","article-title":"Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences","author":"Maillard","year":"2011","journal-title":"In Proceedings of the 24th annual Conference on Learning Theory, COLT 11"},{"key":"2026033014112017700_ref093","article-title":"Optimistic optimization of deterministic functions without the knowledge of its smoothness","author":"Munos","year":"2011","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2026033014112017700_ref094","article-title":"Problem Complexity and Method Efficiency in Optimization","author":"Nemirovsky","year":"1983","journal-title":"John Wiley & Sons Ltd"},{"key":"2026033014112017700_ref095","article-title":"Introductory Lectures on Convex Optimization: A Basic Course","author":"Nesterov","year":"2004","journal-title":"Kluwer Academic Publishers"},{"key":"2026033014112017700_ref096","article-title":"Interval Methods for Systems of Equations","author":"Neumaier","year":"1990","journal-title":"Cambridge University Press"},{"key":"2026033014112017700_ref097","article-title":"Principles of Artificial Intelligence","author":"Nilsson","year":"1980","journal-title":"Tioga Publishing"},{"key":"2026033014112017700_ref098","article-title":"On-line search for solving large Markov decision processes","author":"P\u00e9ret","year":"2004","journal-title":"In Proceedings of the 16th European Conference on Artificial Intelligence"},{"key":"2026033014112017700_ref099","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1613\/jair.2078","article-title":"Anytime point-based approximations for large POMDPs","volume":"27","author":"Pineau","year":"2006","journal-title":"Journal of Artificial Intelligence Research"},{"key":"2026033014112017700_ref100","article-title":"Global Optimization in Action (Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications)","author":"Pint\u00e9r","year":"1996","journal-title":"Kluwer Academic Publishers"},{"key":"2026033014112017700_ref101","article-title":"Markov Decision Processes \u2014 Discrete Stochastic Dynamic Programming","author":"Puterman","year":"1994","journal-title":"John Wiley & Sons, Inc., New York, NY"},{"key":"2026033014112017700_ref102","article-title":"Biasing Monte-Carlo simulations through RAVE values","author":"Rimmel","year":"2010","journal-title":"In International Conference on Computers and Games"},{"key":"2026033014112017700_ref103","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1090\/S0002-9904-1952-09620-8","article-title":"Some aspects of the sequential design of experiments","volume":"58","author":"Robbins","year":"1952","journal-title":"Bulletin of the American Mathematical Society"},{"key":"2026033014112017700_ref104","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1613\/jair.2567","article-title":"Online planning algorithms for POMDPs","volume":"32","author":"Ross","year":"2008","journal-title":"Journal of Artificial Intelligence Research"},{"key":"2026033014112017700_ref105","first-page":"1655","article-title":"A Bayesian approach for learning and planning in partially observable Markov decision processes","volume":"12","author":"Ross","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"2026033014112017700_ref106","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1287\/moor.1100.0446","article-title":"Linearly parameterized bandits","volume":"35","author":"Rusmevichientong","year":"2010","journal-title":"Mathematics of Operations Research"},{"key":"2026033014112017700_ref107","article-title":"Markov Decision Processes in Artificial Intelligence","author":"Sigaud","year":"2010","journal-title":"Wiley"},{"key":"2026033014112017700_ref108","article-title":"Reinforcement Learning and Simulation-Based Search in Computer Go","author":"Silver","year":"2009","journal-title":"PhD thesis, University of Alberta"},{"key":"2026033014112017700_ref109","article-title":"Monte-carlo planning in large POMDPs","author":"Silver","year":"2012","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2026033014112017700_ref110","article-title":"Contextual bandits with similarity information","author":"Slivkins","year":"2011","journal-title":"In Proceedings of the 24th annual Conference on Learning Theory, COLT 11"},{"key":"2026033014112017700_ref111","first-page":"1015","article-title":"Gaussian process optimization in the bandit setting: No regret and experimental design","author":"Srinivas","year":"2010","journal-title":"In International Conference on Machine Learning"},{"key":"2026033014112017700_ref112","article-title":"Global Optimization with Non-Convex Constraints: Sequential and Parallel Algorithms","author":"Strongin","year":"2000","journal-title":"Kluwer Academic Publishers, Dordrecht \/ Boston \/ London"},{"key":"2026033014112017700_ref113","article-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"1998","journal-title":"MIT Press"},{"key":"2026033014112017700_ref114","article-title":"Algorithms for Reinforcement Learning","author":"Szepesv\u00e1ri","year":"2010","journal-title":"Morgan & Claypool Publishers"},{"key":"2026033014112017700_ref115","doi-asserted-by":"crossref","first-page":"450","DOI":"10.2307\/2371219","article-title":"On the theory of apportionment","volume":"57","author":"Thompson","year":"1935","journal-title":"American Journal of Mathematics"},{"key":"2026033014112017700_ref116","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the likelihood that one unknown probability exceeds another in view of the evidence of two samples","volume":"25","author":"Thompson","year":"1933","journal-title":"Biometrika"},{"key":"2026033014112017700_ref117","article-title":"Information-based Complexity","author":"Traub","year":"1988","journal-title":"Academic Press, New York"},{"key":"2026033014112017700_ref118","article-title":"Stochastic simultaneous optimistic optimization","author":"Valko","year":"2013","journal-title":"In International Conference on Machine Learning"},{"key":"2026033014112017700_ref119","article-title":"Reinforcement Learning: State of the Art","author":"Vlassis","year":"2012","journal-title":"Springer Verlag"},{"key":"2026033014112017700_ref120","article-title":"Bayesian sparse sampling for on-line reward optimization","author":"Wang","year":"2005","journal-title":"In International Conference on Machine Learning"},{"key":"2026033014112017700_ref121","first-page":"175","article-title":"Modifications of UCT and sequence-like simulations for Monte-Carlo Go","author":"Wang","year":"2007","journal-title":"In IEEE Symposium on Computational Intelligence and Games"},{"key":"2026033014112017700_ref122","first-page":"1729","article-title":"Algorithms for infinitely many-armed bandits","author":"Wang","year":"2008","journal-title":"In Daphne Koller, Dale Schuurmans, Yoshua Bengio, and L\u00e9on Bottou, editors, Proceedings of the 22nd Conference on Advances in Neural Information Processing Systems, NIPS 08, Vancouver, British Columbia, Canada, dec"}],"container-title":["Foundations and Trends\u00ae in Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/7\/1\/1\/11147464\/2200000038en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/7\/1\/1\/11147464\/2200000038en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:10:58Z","timestamp":1777486258000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftmal\/article\/7\/1\/1\/1332170\/From-Bandits-to-Monte-Carlo-Tree-Search-The"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,1,20]]},"references-count":122,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,1,20]]}},"URL":"https:\/\/doi.org\/10.1561\/2200000038","relation":{},"ISSN":["1935-8237","1935-8245"],"issn-type":[{"value":"1935-8237","type":"print"},{"value":"1935-8245","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,1,20]]}}}