BriefGPT.xyz
Jun, 2021
折现因子的泰勒展开
Taylor Expansion of Discount Factors
HTML
PDF
Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko
TL;DR
本研究探讨了在实际强化学习中,用于估计价值函数的贴现因子与用于定义评估目标的贴现因子之间的差异对学习的影响,并发现了一族目标,可以插值两个不同贴现因子的价值函数。实验表明,使用这种框架可以提高价值函数的估计效果和策略优化更新效果,并且还提供了新的深度强化学习启发式修改策略优化算法的见解。
Abstract
In practical
reinforcement learning
(RL), the discount factor used for estimating
value functions
often differs from that used for defining the evaluation objective. In this work, we study the effect that this di
→