BriefGPT.xyz
Jul, 2021
无悔的神经上下文赌博机
Neural Contextual Bandits without Regret
HTML
PDF
Parnian Kassraie, Andreas Krause
TL;DR
提出了基于神经网络的算法 (NN-UCB) 来解决序列决策中的上下文强化学习问题,证明了该算法的后悔值可以和使用 NTK-UCB 算法等价。
Abstract
contextual bandits
are a rich model for sequential decision making given side information, with important applications, e.g., in
recommender systems
. We propose novel algorithms for
→