BriefGPT.xyz
Feb, 2024
多对数级别的极小极大遗憾的线性赌博机
Linear bandits with polylogarithmic minimax regret
HTML
PDF
Josep Lumbreras, Marco Tomamichel
TL;DR
研究线性随机赌博机的噪声模型,介绍一种基于加权最小二乘估计的算法,能够最小化后悔度,通过几何论证独立于噪声模型,能够紧密控制每个时间步骤的期望后悔度为O(1/t),从而导致了累积后悔度的对数缩放。
Abstract
We study a
noise model
for
linear stochastic bandits
for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an al
→