Bandits问题中学习先验知识无悔

Jul, 2021

No Regrets for Learning the Prior in Bandits

Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

TL;DR提出 AdaTS，一种适用于与其交互的赌博任务的 Thompson 抽样算法，该算法通过在参数上维护分布来适应未知任务先验分布，并在解决赌博任务时对不确定性进行较为准确的处理。AdaTS 是一种全贝叶斯算法，适用于多种赌博问题的高效实现，其 Bayes 遗憾的上界可以量化由于不知道任务先验而产生的损失，实验证明 AdaTS 在挑战性的实际应用问题中表现出色，优于之前的算法。

Abstract

We propose ${\tt AdaTS}$, a thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with. The key idea in ${\tt AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a distribution over its parameters. When solving a bandit task, that u