In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not only achieves nearly instance-optimal regret in stochastic environments, but also works in corrupted environments with additional regret being the amount of corruption, while the state-of-the-art (Li et al., 2019) achieves neither instance-optimality nor the optimal dependence on the corruption amount. Moreover, by equipping this algorithm with an adversarial component and carefully-designed testings, our second algorithm additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to our knowledge. Finally, all our guarantees hold with high probability, while existing instance-optimal guarantees only hold in expectation.

本文将开发线性试探算法来适应不同的环境，并提出一种新的损失估计方法，该算法在随机环境中实现了几乎实时最优遗憾，还在带有额外遗憾的破损环境中工作，并装备有对抗性组件，同时拥有最小化遗憾的敌对环境优势。

在随机和对抗线性赌博机中同时实现近似实例最优性和极小化最优性