X-武装逼迫算法

Jan, 2010

X-Armed Bandits

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari

TL;DR本文提出了一种名为 HOO 的基于乐观优化的臂选择算法，可以对一类广义随机 bandit 问题给出更好的遗憾界，在一定条件下，在欧几里得空间内的单位超立方体上，通过 HOO 算法的表现，该算法的增长速率与空间维度无关。

Abstract

We consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function