BriefGPT.xyz
Feb, 2020
均值方差赌博机的汤普森采样算法
Thompson Sampling Algorithms for Mean-Variance Bandits
HTML
PDF
Qiuyu Zhu, Vincent Y. F. Tan
TL;DR
本文提出了针对均值-方差MAB问题的Thompson抽样算法,并在更少的假设条件下提供了高斯和伯努利bandit的全面损失分析。我们的算法在各种参数配置下都达到了最好的已知损失边界。
Abstract
The
multi-armed bandit
(MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff. However, standard formulations do not take into account
risk
. In online decision making sy
→