BriefGPT.xyz
Apr, 2013
Thompson抽样的无先验和有先验依赖的遗憾界
A note on the Bayesian regret of Thompson Sampling with an arbitrary prior
HTML
PDF
Sébastien Bubeck, Che-Yu Liu
TL;DR
研究具有奖励分布先验分布的随机多臂赌博问题,证明Thompson Sampling算法在没有先验分布时具有最优的贝叶斯遗憾上界,并在Bubeck等人的先验设置下证明了算法的一致界限,并与Audibert和Bubeck [2009]和Russo和Roy [2013]的技术方法有关。
Abstract
We consider the
stochastic multi-armed bandit
problem with a
prior distribution
on the reward distributions. We show that for any
prior distribut
→