最佳两种选择：随机和对抗臂

Feb, 2012

The best of both worlds: stochastic and adversarial bandits

Sebastien Bubeck, Aleksandrs Slivkins

TL;DR通过结合Exp3 和 UCB1两个先前算法的优点，我们提出了一种新的、在对抗性奖励和随机奖励两种情况下表现最优的bandit算法SAO。

Abstract

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for →