TL;DR研究线性随机赌博机的噪声模型,介绍一种基于加权最小二乘估计的算法,能够最小化后悔度,通过几何论证独立于噪声模型,能够紧密控制每个时间步骤的期望后悔度为 O (1/t),从而导致了累积后悔度的对数缩放。
Abstract
We study a noise model for linear stochastic bandits for which the
subgaussian noise parameter vanishes linearly as we select actions on the unit
sphere closer and closer to the unknown vector. We introduce an al