BriefGPT.xyz
Jul, 2015
优化置信区间上界算法:改进有限臂赌博机的遗憾
Optimally Confident UCB : Improved Regret for Finite-Armed Bandits
HTML
PDF
Tor Lattimore
TL;DR
提出了一种基于UCB并具有适当的置信参数平衡风险和过度乐观代价的随机有限臂老虎机算法,同时具有最优问题依赖性遗憾和最坏情况遗憾。
Abstract
I present the first algorithm for
stochastic
finite-armed bandits
that simultaneously enjoys order-optimal problem-dependent
regret
and wo
→