We consider the adversarial convex bandit problem and we build the first poly(
T
)-time algorithm with poly(
n
) √
T
-regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(
n
9.5
√
T
)-regret, and we show that a simple variant of this algorithm can be run in poly(
n
log (
T
))-time per step (for polytopes with polynomially many constraints) at the cost of an additional poly(
n
)
T
o(1)
factor in the regret. These results improve upon the Õ(
n
11
√
T
-regret and exp (poly(
T
))-time result of the first two authors and the log (
T
)

poly(
n
)

√
T
-regret and log(
T
)

poly(
n
)

-time result of Hazan and Li. Furthermore, we conjecture that another variant of the algorithm could achieve Õ(
n
1.5
√
T
)-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω (
n
√
T
) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order
n
3
/ ε
2
.

Title: Kernel-based Methods for Bandit Convex Optimization
Authors: Sébastien Bubeck, Ronen Eldan, Yin Tat Lee
Published: Journal of the ACM, 2021 