通过扰动奖励学习神经上下文强化学习

Jan, 2022

通过扰动奖励学习神经上下文强化学习

Learning Contextual Bandits Through Perturbed Rewards

Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

TL;DR利用扰动更新神经网络，消除显式探索和计算开销，可在标准规则条件下实现$\tilde{O}(\tilde{d}\sqrt{T})$的遗憾上限，是一种高效且有效的神经自适应算法。

Abstract

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts. But because their exploration has to be per