BriefGPT.xyz
Mar, 2023
具有正态伽马先验的线性赌博机问题的汤普森取样算法
Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors
HTML
PDF
Björn Lindenberg, Karl-Olof Lindahl
TL;DR
本文介绍了在正态分布奖励模型下使用贝叶斯推断方法的 Thompson 抽样算法在多臂赌博问题中的应用,通过使用多元正态分布-伽玛先验来表示所有相关参数的环境不确定性,并得出了关于 Thompson 抽样算法的贝叶斯遗憾界,其前提条件为方差分布的 5/2 阶矩存在。
Abstract
We consider
thompson sampling
for
linear bandit problems
with finitely many independent arms, where rewards are sampled from normal distributions that are linearly dependent on unknown parameter vectors and with
→