核化情境赌博机的有限时间分析

Sep, 2013

Finite-Time Analysis of Kernelised Contextual Bandits

Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini

TL;DR本文提出了一种基于KernelUCB算法在具有相似性但动作数量巨大的问题中进行在线奖励最大化，适用于重现核希尔伯特空间中的任意线性奖励函数。

Abstract

We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. Howe