Sep, 2023

使用分数后验信息的汤普森抽样泛化遗憾分析

TL;DRThompson sampling (TS) is a popular algorithm for solving multi-armed bandit problems; this paper introduces a variant called $\alpha$-TS with tempered likelihoods in the posterior distribution, and provides regret bounds for both instance-dependent and instance-independent scenarios.