BriefGPT.xyz
Apr, 2017
核化多臂赌博机
On Kernelized Multi-armed Bandits
HTML
PDF
Sayak Ray Chowdhury, Aditya Gopalan
TL;DR
本文提出了两种基于高斯过程的算法-改进的GP-UCB(IGP-UCB)和GP-Thomson采样(GP-TS),并给出了相应的遗憾边界,在连续的臂集上解决了随机赌徒问题。当期望奖励函数属于复制核希尔伯特空间(RKHS)时,边界成立。在实验评估和对合成和真实世界环境中现有算法的比较中,突出了所提出策略的优势。
Abstract
We consider the
stochastic bandit problem
with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new
gaussian process-based algorithms
for <
→