BriefGPT.xyz
Aug, 2019
轨迹控制变量在策略梯度方法中的方差减少应用
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
HTML
PDF
Ching-An Cheng, Xinyan Yan, Byron Boots
TL;DR
该研究分析控制变量技术在策略梯度方法中应用的属性和缺陷,并提出了一种新的、递归构造的迹线方法,用于在合理假设下进一步降低方差。
Abstract
policy gradient methods
have demonstrated success in
reinforcement learning
tasks that have high-dimensional continuous state and action spaces. However,
→