BriefGPT.xyz
Dec, 2015
增大行动差距:强化学习的新算子
Increasing the Action Gap: New Operators for Reinforcement Learning
HTML
PDF
Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos
TL;DR
介绍了一些新的 Q 函数的保优性算子,其中包括局部策略一致性一类的操作,可以有效减缓近似和估计误差对诱导贪心策略的不良影响,并在包括枚举离散问题和连续问题的情形下提供了有效性证明。
Abstract
This paper introduces new
optimality-preserving operators
on
q-functions
. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of
→