{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T07:42:05Z","timestamp":1723016525254},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence setting ; this is also the problem of optimizing an unknown linear function over a discrete domain with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm for best arm identification. Existing approaches that achieve optimal sample complexity assume either access to a nontrivial minimax optimization oracle (e.g. RAGE and Lazy-TS) or knowledge of an upper-bound on the norm of the unknown parameter vector(e.g. LinGame). Our algorithm, which we call the Phased Elimination Linear Exploration Game (PELEG), maintains a high-probability confidence ellipsoid containing the unknown vector, and uses it to eliminate suboptimal arms in phases, where a minimax problem is essentially solved in each phase using two-player low regret learners. PELEG does not require to know a bound on norm of the unknown vector, and is asymptotically-optimal, settling an open problem. We show that the sample complexity of PELEG matches, up to order and in a non-asymptotic sense, an instance-dependent universal lower bound for linear bandits. PELEG is thus the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.<\/jats:p>","DOI":"10.24963\/ijcai.2022\/515","type":"proceedings-article","created":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T02:55:56Z","timestamp":1657940156000},"page":"3709-3715","source":"Crossref","is-referenced-by-count":0,"title":["Improved Pure Exploration in Linear Bandits with No-Regret Learning"],"prefix":"10.24963","author":[{"given":"Mohammadi","family":"Zaki","sequence":"first","affiliation":[{"name":"Indian Institute of Science (IISc), Bangalore"}]},{"given":"Avi","family":"Mohan","sequence":"additional","affiliation":[{"name":"Boston University, MA"}]},{"given":"Aditya","family":"Gopalan","sequence":"additional","affiliation":[{"name":"Indian Institute of Science (IISc), Bangalore"}]}],"member":"10584","event":{"number":"31","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2022","name":"Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}","start":{"date-parts":[[2022,7,23]]},"theme":"Artificial Intelligence","location":"Vienna, Austria","end":{"date-parts":[[2022,7,29]]}},"container-title":["Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T11:10:08Z","timestamp":1658142608000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2022\/515"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2022\/515","relation":{},"subject":[],"published":{"date-parts":[[2022,7]]}}}