基于核 $ε$-Greedy策略的情境赌博机

Jun, 2023

基于核 $ε$-Greedy策略的情境赌博机

Kernel $ε$-Greedy for Contextual Bandits

Sakshi Arya, Bharath K. Sriperumbudur

TL;DR该研究提出了一种基于核的上下文臂策略，使用在线加权核岭回归估算器对奖励函数进行估算，并在一定条件下证明了该估计器的一致性，同时针对任何核和相应的RKHS均可实现次线性遗憾率和最优遗憾率。

Abstract

We consider a kernelized version of the $\epsilon$-greedy strategy for contextual bandits. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing ker