BriefGPT.xyz
Sep, 2013
核化情境赌博机的有限时间分析
Finite-Time Analysis of Kernelised Contextual Bandits
HTML
PDF
Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini
TL;DR
本文提出了一种基于KernelUCB算法在具有相似性但动作数量巨大的问题中进行在线奖励最大化,适用于重现核希尔伯特空间中的任意线性奖励函数。
Abstract
We tackle the problem of
online reward maximisation
over a large
finite set of actions
described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. Howe
→