折现因子的泰勒展开

Jun, 2021

Taylor Expansion of Discount Factors

Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

TL;DR本研究探讨了在实际强化学习中，用于估计价值函数的贴现因子与用于定义评估目标的贴现因子之间的差异对学习的影响，并发现了一族目标，可以插值两个不同贴现因子的价值函数。实验表明，使用这种框架可以提高价值函数的估计效果和策略优化更新效果，并且还提供了新的深度强化学习启发式修改策略优化算法的见解。

Abstract

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this di