BriefGPT.xyz
Oct, 2020
使用REINFORCE的高效样本强化学习
Sample Efficient Reinforcement Learning with REINFORCE
HTML
PDF
Junzi Zhang, Jongho Kim, Brendan O'Donoghue, Stephen Boyd
TL;DR
研究了RL中的policy gradient methods,建立了REINFORCE算法的全局收敛理论,围绕梯度估计和采样效率等方面进行了研究。
Abstract
policy gradient methods
are among the most effective methods for large-scale
reinforcement learning
, and their empirical success has prompted several works that develop the foundation of their
→