无悔的神经上下文赌博机

Jul, 2021

Neural Contextual Bandits without Regret

Parnian Kassraie, Andreas Krause

TL;DR提出了基于神经网络的算法 (NN-UCB) 来解决序列决策中的上下文强化学习问题，证明了该算法的后悔值可以和使用 NTK-UCB 算法等价。

Abstract

contextual bandits are a rich model for sequential decision making given side information, with important applications, e.g., in recommender systems. We propose novel algorithms for →