BriefGPT.xyz
Sep, 2020
相位策略梯度
Phasic Policy Gradient
HTML
PDF
Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman
TL;DR
Phasic Policy Gradient (PPG) 是一种强化学习框架,通过将策略和价值函数训练分成两个不同的阶段来修改传统的在线策略演员-评论家方法,从而在保持各自优点的同时提高样本利用效率。
Abstract
We introduce
phasic policy gradient
(PPG), a
reinforcement learning
framework which modifies traditional on-policy
actor-critic methods
by
→