BriefGPT.xyz
Jan, 2022
通过扰动奖励学习神经上下文强化学习
Learning Contextual Bandits Through Perturbed Rewards
HTML
PDF
Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang
TL;DR
利用扰动更新神经网络,消除显式探索和计算开销,可在标准规则条件下实现$\tilde{O}(\tilde{d}\sqrt{T})$的遗憾上限,是一种高效且有效的神经自适应算法。
Abstract
Thanks to the power of
representation learning
,
neural contextual bandit algorithms
demonstrate remarkable performance improvement against their classical counterparts. But because their exploration has to be per
→