Sep, 2023
使用分数后验信息的汤普森抽样泛化遗憾分析
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors
Prateek Jaiswal, Debdeep Pati, Anirban Bhattacharya, Bani K. Mallick
TL;DRThompson sampling (TS) is a popular algorithm for solving multi-armed bandit problems; this paper introduces a variant called $\alpha$-TS with tempered likelihoods in the posterior distribution, and provides regret bounds for both instance-dependent and instance-independent scenarios.