非平稳环境下高效应对背景干扰的Bandit算法

Aug, 2017

Efficient Contextual Bandits in Non-stationary Worlds

Haipeng Luo, Alekh Agarwal, John Langford

TL;DR本研究开发了多种高效的上下文推断算法，为非平稳环境提供了有效的解决方案，具有动态适应分布变化的能力，同时通过对各种标准回归进行分析，证明了在时间和空间成本上都能达到最优的效果。

Abstract

Most contextual bandit algorithms minimize regret to the best fixed policy--a questionable benchmark for non-stationary environments ubiqu