线性赌博机中的最优臂识别

Sep, 2014

Best-Arm Identification in Linear Bandits

Marta Soare, Alessandro Lazaric, Rémi Munos

TL;DR本文研究线性贝叶斯最优化模型中的最优臂选择问题，提出样本分配策略来识别具有固定置信度的最优臂，并在最小化样本预算的同时改进了全局线性结构估计附近最优臂的奖励值，并将其与最优实验设计中使用的G-最优准则进行比较。

Abstract

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward. We c