有重尾巴的赌徒

Sep, 2012

Bandits with heavy tail

Sébastien Bubeck, Nicolò Cesa-Bianchi, Gábor Lugosi

TL;DR本文考察了当奖励分布具有1+ε阶矩时的多臂赌博问题，通过定义基于更精细的估计器的采样策略，如截断经验均值、Catoni的M-估计和均值中位数估计器，证明了二阶矩（有限方差）足以获得与次高斯奖励分布同阶的悔恨界。

Abstract

The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of