BriefGPT.xyz
May, 2019
维度重要性采样权重截断用于高效采样强化学习
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
HTML
PDF
Seungyul Han, Youngchul Sung
TL;DR
本文介绍了一种针对Proximal Policy Optimization(PPO)算法的改良方法,通过维度加权剪裁的方式来避免重大偏差,提高智能体高维任务的样本效率并提升新算法的性能。
Abstract
In
importance sampling
(IS)-based
reinforcement learning
algorithms such as
proximal policy optimization
(PPO), IS weights are typically c
→