BriefGPT.xyz
Sep, 2012
有重尾巴的赌徒
Bandits with heavy tail
HTML
PDF
Sébastien Bubeck, Nicolò Cesa-Bianchi, Gábor Lugosi
TL;DR
本文考察了当奖励分布具有1+ε阶矩时的多臂赌博问题,通过定义基于更精细的估计器的采样策略,如截断经验均值、Catoni的M-估计和均值中位数估计器,证明了二阶矩(有限方差)足以获得与次高斯奖励分布同阶的悔恨界。
Abstract
The
stochastic multi-armed bandit problem
is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have
moments of
→