赌博机凸优化问题的最优算法

Mar, 2016

An optimal algorithm for bandit convex optimization

Elad Hazan, Yuanzhi Li

TL;DR本文针对带有随机反馈的在线凸优化问题（称为bandit convex optimization），通过将椭球法应用于在线学习，给出了第一个$\tilde{O}(\sqrt{T})$-regret算法，并引入了离散凸几何中的新工具。

Abstract

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O}(\sqrt{T})$-→