Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran
TL;DR提出一种名为SHARP的基于二阶信息和具有时间变化的学习率的动量随机梯度下降方法,实现一阶可压缩的稳定点,无需重要性采样,具有O(1 / t ^ {2/3})的误差估计方差下降速率。实验结果表明该算法在控制任务上比现有算法效果更好。
Abstract
The variance reduced gradient estimators for policy gradient methods has been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance reduced policy gradient method, called SGDHess-PG, w