BriefGPT.xyz
Feb, 2019
具有连续动作的上下文臂机:平滑、缩放和自适应
Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting
HTML
PDF
Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
TL;DR
研究了一个抽象策略类和连续动作空间下的情境赌博学习,得到了与平滑策略类竞争以及要求标准Lipschitz条件的两个不同的遗憾界限。同时,我们研究了适应未知平滑参数的问题,建立了可适应性的代价,并推导出需要额外信息的最优自适应算法。
Abstract
We study
contextual bandit learning
with an abstract policy class and
continuous action space
. We obtain two qualitatively different
regret bound
→