Learning with sparse rewards remains a significant challenge in reinforcement
learning (RL), especially when the aim is to train a policy capable of
achieving multiple different goals. To date, the most successful approaches for
dealing with multi-goal, sparse reward environments have been model-free RL
algorithms. In this work we propose PlanGAN, a model-based algorithm
specifically designed for solving multi-goal tasks in environments with sparse
rewards. Our method builds on the fact that any trajectory of experience
collected by an agent contains useful information about how to achieve the
goals observed during that trajectory. We use this to train an ensemble of
conditional generative models (GANs) to generate plausible trajectories that
lead the agent from its current state towards a specified goal. We then combine
these imagined trajectories into a novel planning algorithm in order to achieve
the desired goal as efficiently as possible. The performance of PlanGAN has
been tested on a number of robotic navigation/manipulation tasks in comparison
with a range of model-free reinforcement learning baselines, including
Hindsight Experience Replay. Our studies indicate that PlanGAN can achieve
comparable performance whilst being around 4-8 times more sample efficient.

本研究提出了 PlanGAN，一种使用模型的算法，专门针对具有稀疏奖励环境的多目标任务进行求解，该算法比最成功的基于无模型 RL 算法的方法在提高 4-8 倍的样本效率下达到可比较的表现。