BriefGPT.xyz
Sep, 2017
不确定贝尔曼方程与探索
The Uncertainty Bellman Equation and Exploration
HTML
PDF
Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih
TL;DR
本文中,我们考虑了强化学习中的探索/利用问题,提出了不确定性Bellman方程(UBE)来扩展策略的潜在探索利益,并证明了该方程的唯一不动点产生的方差上限是由任何策略引起的Q值的后验分布,相较于传统的基于计数的奖励方法,它控制了方差,将UBE探索策略替换为ε-greedy可提高在Atari游戏中DQN性能的表现。
Abstract
We consider the
exploration/exploitation
problem in
reinforcement learning
. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent tim
→