BriefGPT.xyz
Feb, 2021
关于近端策略优化中的重尾梯度
On Proximal Policy Optimization's Heavy-tailed Gradients
HTML
PDF
Saurabh Garg, Joshua Zhanson, Emilio Parisotto, Adarsh Prasad, J. Zico Kolter...
TL;DR
本文研究了PPO类算法的梯度的重尾性质,并提出了一个高维鲁棒估计器GMOM来替代几个剪切技巧,解决梯度重尾的问题,实验表明在MuJoCo测试任务上表现出与PPO相当的性能。
Abstract
Modern policy gradient algorithms, notably
proximal policy optimization
(PPO), rely on an arsenal of
heuristics
, including loss clipping and
grad
→