回顾策略梯度

Nov, 2017

Hindsight policy gradients

Paulo Rauber, Filipe Mutz, Juergen Schmidhuber

TL;DR本文研究如何将 hindsight 引入到 policy gradient 方法中，对各种稀疏奖励机制进行实验并表明 hindsight 能显著提高样本效率。

Abstract

Goal-conditional policies allow reinforcement learning agents to pursue specific goals during different episodes. In addition to their potential to generalize desired behavior to unseen goals, such policies may also help in defining options for arbitrary →