基于上置信界探索的神经上下文波段算法

Nov, 2019

基于上置信界探索的神经上下文波段算法

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

Dongruo Zhou, Lihong Li, Quanquan Gu

TL;DR我们提出了一种新算法NeuralUCB来解决随机上下文的赌博机问题，它利用了深度神经网络的表达能力并使用基于神经网络的随机特征映射来构建奖励的上界，证明了该算法能够在一些基准测试中具有实际竞争力且能够保证近乎最优的回报保证。

Abstract

We study the stochastic contextual bandit problem, where the reward is generated from an unknown bounded function with additive noise. We propose the neuralucb algorithm, which leverages the representation power