BriefGPT.xyz
Jun, 2023
基于核 $ε$-Greedy策略的情境赌博机
Kernel $ε$-Greedy for Contextual Bandits
HTML
PDF
Sakshi Arya, Bharath K. Sriperumbudur
TL;DR
该研究提出了一种基于核的上下文臂策略,使用在线加权核岭回归估算器对奖励函数进行估算,并在一定条件下证明了该估计器的一致性,同时针对任何核和相应的RKHS均可实现次线性遗憾率和最优遗憾率。
Abstract
We consider a
kernelized version
of the $\epsilon$-greedy strategy for
contextual bandits
. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing ker
→