BriefGPT.xyz
May, 2014
随机多臂赌博机中的广义风险厌恶
Generalized Risk-Aversion in Stochastic Multi-Armed Bandits
HTML
PDF
Alexander Zimin, Rasmus Ibsen-Jensen, Krishnendu Chatterjee
TL;DR
探讨了在多臂赌博机中最小化遗憾的问题,其中臂的好坏度量不是平均回报率,而是平均值和方差的某个通用函数,特征化了学习可能的条件,并展示了对于某些情况自然算法无法实现亚线性遗憾的例子。
Abstract
We consider the problem of minimizing the regret in
stochastic multi-armed bandit
, when the measure of goodness of an arm is not the mean return, but some general function of the mean and the
variance
.We characte
→