BriefGPT.xyz
Feb, 2012
最佳两种选择:随机和对抗臂
The best of both worlds: stochastic and adversarial bandits
HTML
PDF
Sebastien Bubeck, Aleksandrs Slivkins
TL;DR
通过结合Exp3 和 UCB1两个先前算法的优点,我们提出了一种新的、在对抗性奖励和随机奖励两种情况下表现最优的bandit算法SAO。
Abstract
We present a new
bandit algorithm
, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for
adversarial rewards
and for
→