具有正态伽马先验的线性赌博机问题的汤普森取样算法

Mar, 2023

具有正态伽马先验的线性赌博机问题的汤普森取样算法

Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors

Björn Lindenberg, Karl-Olof Lindahl

TL;DR本文介绍了在正态分布奖励模型下使用贝叶斯推断方法的 Thompson 抽样算法在多臂赌博问题中的应用，通过使用多元正态分布-伽玛先验来表示所有相关参数的环境不确定性，并得出了关于 Thompson 抽样算法的贝叶斯遗憾界，其前提条件为方差分布的 5/2 阶矩存在。

Abstract

We consider thompson sampling for linear bandit problems with finitely many independent arms, where rewards are sampled from normal distributions that are linearly dependent on unknown parameter vectors and with