逆向动力学的后见之明政策延续

Oct, 2019

Policy Continuation with Hindsight Inverse Dynamics

Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou

TL;DR本文提出了一种名为PCHID的新方法，它通过利用Hindsight Experience Replay学习Hindsight Inverse Dynamics来有效地解决奖励稀疏的任务，并在多目标任务GridWorld和FetchReach上实现了显著的样本效率和最终性能的提高。

Abstract

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called →