BriefGPT.xyz
Mar, 2024
非平稳线性赌博机的方差依赖遗憾界
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits
HTML
PDF
Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou
TL;DR
通过利用奖励分布的方差和总变化预算,我们提出了Restarted WeightedOFUL+和Restarted SAVE+两种新算法,它们在非平稳随机线性赌博机问题中能够取得更紧密的遗憾上界,尤其在奖励的总方差远小于轮数K时,超过了现有工作的性能。
Abstract
We investigate the
non-stationary stochastic linear bandit
problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the
total variation budget
$B_K$, whic
→