BriefGPT.xyz
Apr, 2016
风险规避的均值方差多臂赌博机问题
Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure
HTML
PDF
Sattar Vakili, Qing Zhao
TL;DR
本文研究了在风险厌恶的多臂老虎机问题中使用收益的均值和方差作为风险度量,并证明了 UCB 策略和 DSEE 策略可以实现收益方面的最优表现,且模型特定和模型无关的遗憾都有下界。
Abstract
The
multi-armed bandit problems
have been studied mainly under the measure of expected total reward accrued over a horizon of length $T$. In this paper, we address the issue of risk in
multi-armed bandit problems
→