非平稳线性赌臂问题的简单解法

Mar, 2021

Non-stationary Linear Bandits Revisited

Peng Zhao, Lijun Zhang

TL;DR本文研究了非平稳线性臂问题，提出了一种基于重启策略的算法以平衡利用和探索，并证明了该算法的动态遗憾值，同时还解决了现有算法中的严重技术缺陷问题。

Abstract

In this note, we revisit non-stationary linear bandits, a variant of stochastic linear bandits with a time-varying underlying regression parameter. Existing studies develop various algorithms and show that they enjoy an $\widetilde{O}(T^{2/3}(1+P_T)^{1/3})$ →