基于共轭策略的策略梯度方法的多样化探索

Feb, 2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

Andrew Cohen, Xingye Qiao, Lei Yu, Elliot Way, Xiangrong Tong

TL;DR本文提出通过共轭策略的多样化探索（DE），以解决在政策梯度方法中保持良好性能的有效探索的问题，DE学习和应用一组共轭策略，并提供了理论和实证结果，证明DE实现了探索，提高了策略性能，并优于探索随机策略扰动。

Abstract

We address the challenge of effective exploration while maintaining good performance in policy gradient methods. As a solution, we propose diverse →