随机多臂赌博机中的广义风险厌恶

May, 2014

随机多臂赌博机中的广义风险厌恶

Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

Alexander Zimin, Rasmus Ibsen-Jensen, Krishnendu Chatterjee

TL;DR探讨了在多臂赌博机中最小化遗憾的问题，其中臂的好坏度量不是平均回报率，而是平均值和方差的某个通用函数，特征化了学习可能的条件，并展示了对于某些情况自然算法无法实现亚线性遗憾的例子。

Abstract

We consider the problem of minimizing the regret in stochastic multi-armed bandit, when the measure of goodness of an arm is not the mean return, but some general function of the mean and the variance.We characte