BriefGPT.xyz
May, 2024
具有相似臂的图反馈贝叶斯打赏
Graph Feedback Bandits with Similar Arms
HTML
PDF
Han Qi, Guo Fei, Li Zhu
TL;DR
我们研究了具有图反馈的随机多臂赌博机问题,建立了这种新颖反馈结构的遗憾下界,并引入了两种基于UCB的算法:具有问题独立遗憾上界的D-UCB和具有问题相关上界的C-UCB。借助相似性结构,我们还研究了臂的数量随时间增加的情况,并提供了这两种算法的遗憾上界,并讨论了遗憾上界与臂均值分布的次线性关系。最后,我们进行实验证实了理论结果。
Abstract
In this paper, we study the
stochastic multi-armed bandit problem
with
graph feedback
. Motivated by the clinical trials and recommendation problem, we assume that two arms are connected if and only if they are si
→