BriefGPT.xyz
Sep, 2019
具有递归方差降低的高效策略梯度方法
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
HTML
PDF
Pan Xu, Felicia Gao, Quanquan Gu
TL;DR
该研究旨在提高强化学习中采样效率,通过提出一种名为SRVR-PG的新型策略梯度算法,并对其进行了数值实验以验证其性能。
Abstract
Improving the
sample efficiency
in
reinforcement learning
has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing
→