具有连续动作的上下文臂机：平滑、缩放和自适应

Feb, 2019

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

TL;DR研究了一个抽象策略类和连续动作空间下的情境赌博学习，得到了与平滑策略类竞争以及要求标准Lipschitz条件的两个不同的遗憾界限。同时，我们研究了适应未知平滑参数的问题，建立了可适应性的代价，并推导出需要额外信息的最优自适应算法。

Abstract

We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bound