OSOM: 一种用于多臂和线性上下文赌博机的同时最优算法

May, 2019

OSOM: 一种用于多臂和线性上下文赌博机的同时最优算法

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett

TL;DR我们设计了一个算法，能够同时在简单多臂赌博机模式下获得问题相关的最优遗憾率和在线性上下文赌博机模式下获得极小化最优遗憾率，而不需要事先知道哪种模型产生了奖励。

Abstract

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden \textit{simple multi-armed bandit} structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computat