TL;DR本文提出了一种自适应算法以应对目标函数的未知平滑度,展示并计算适应于 H {"o} lder 正则性的多项式成本以进行后悔最小化,提供了有限时间分析和关于渐近最优性的彻底讨论。
Abstract
In the context of stochastic continuum-armed bandits, we present an algorithm
that adapts to the unknown smoothness of the objective function. We exhibit and
compute a polynomial cost of adaptation to the H{\"o}lder regularity for regret
minimization. To do this, we first reconsider th