BriefGPT.xyz
May, 2015
预算多臂老虎机的汤普森抽样
Thompson Sampling for Budgeted Multi-armed Bandits
HTML
PDF
Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, Tie-Yan Liu
TL;DR
本文将 Thompson sampling 算法扩展到预算限制的 MAB 中,通过从后验分布中采样两个数字并比较选择具有最大比值的手臂进行更新,证明此算法在伯努利臂或普通分布下的分布相关遗憾界都是在预算上对数复杂度,通过我们的仿真实验验证了该算法的有效性。
Abstract
thompson sampling
is one of the earliest randomized algorithms for
multi-armed bandits
(MAB). In this paper, we extend the
thompson sampling
→