元策略梯度学习探索

Mar, 2018

Learning to Explore with Meta-Policy Gradient

Tianbing Xu, Qiang Liu, Liang Zhao, Wei Xu, Jian Peng

TL;DR该研究论文提出了一种基于`meta-policy gradient`算法的自适应学习方法，可用于解决现有基于添加噪声的探索方法仅能探索接近actor策略的局部区域的问题，从而实现独立于actor策略的全局探索，而这对各种强化学习任务的样本效率都有相当大的提升。

Abstract

The performance of off-policy learning, including deep q-learning and deep deterministic policy gradient (ddpg), critically depends on the