风险规避的均值方差多臂赌博机问题

Apr, 2016

风险规避的均值方差多臂赌博机问题

Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Sattar Vakili, Qing Zhao

TL;DR本文研究了在风险厌恶的多臂老虎机问题中使用收益的均值和方差作为风险度量，并证明了 UCB 策略和 DSEE 策略可以实现收益方面的最优表现，且模型特定和模型无关的遗憾都有下界。

Abstract

The multi-armed bandit problems have been studied mainly under the measure of expected total reward accrued over a horizon of length $T$. In this paper, we address the issue of risk in multi-armed bandit problems