BriefGPT.xyz
May, 2022
突破sqrt(T)壁垒:随机情境线性老虎机具有无关实例的对数遗憾
Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
HTML
PDF
Avishek Ghosh, Abishek Sankararaman
TL;DR
本文提出了一种名为LR-SCB的低后悔随机情境赌博算法,可以通过利用随机情境、参数估计和后悔最小化来减少多项式级别的对数后悔,并通过实验证明了随机情境的后悔确实会随着多项式级别而增加。
Abstract
We prove an instance independent (poly) logarithmic regret for
stochastic contextual bandits
with
linear payoff
. Previously, in \cite{chu2011contextual}, a lower bound of $\mathcal{O}(\sqrt{T})$ is shown for the
→