BriefGPT.xyz
Oct, 2019
线性上下文臂优化中的自适应探索
Adaptive Exploration in Linear Contextual Bandit
HTML
PDF
Botao Hao, Tor Lattimore, Csaba Szepesvari
TL;DR
我们设计了一种渐近上限最优算法,并充分利用线性结构和精确探索,从而减少了在多种合理情境下的失算,数值结果表明,与其他基准算法相比,我们的方法大大减少了失算。
Abstract
contextual bandits
serve as a fundamental model for many sequential decision making tasks. The most popular theoretically justified approaches are based on the
optimism principle
. While these algorithms can be pr
→