BriefGPT.xyz
Jan, 2019
基于信赖域引导的近端策略优化
Trust Region-Guided Proximal Policy Optimization
HTML
PDF
Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan
TL;DR
对Proximal policy optimization的探索行为进行了深入分析,提出了一种名为Trust Region-Guided PPO的新的策略优化方法,通过自适应调整裁剪范围解决了初始条件差的情况下缺乏探索的问题,并证明其相较于原始的PPO算法有更好的性能表现。
Abstract
Model-free reinforcement learning relies heavily on a safe yet
exploratory policy search
.
proximal policy optimization
(PPO) is a prominent algorithm to address the safe search problem, by exploiting a heuristic
→