BriefGPT.xyz
Feb, 2019
基于共轭策略的策略梯度方法的多样化探索
Diverse Exploration via Conjugate Policies for Policy Gradient Methods
HTML
PDF
Andrew Cohen, Xingye Qiao, Lei Yu, Elliot Way, Xiangrong Tong
TL;DR
本文提出通过共轭策略的多样化探索(DE),以解决在政策梯度方法中保持良好性能的有效探索的问题,DE学习和应用一组共轭策略,并提供了理论和实证结果,证明DE实现了探索,提高了策略性能,并优于探索随机策略扰动。
Abstract
We address the challenge of effective
exploration
while maintaining good performance in
policy gradient methods
. As a solution, we propose diverse
→