In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. In this paper, to enhance the diversity of relabeled goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. Besides, to improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training. By integrating these two improvements, we introduce the MapGo framework (Model-Assisted Policy Optimization for Goal-oriented tasks). In our experiments, we first show the effectiveness of the FGI strategy compared with the hindsight one, and then show that the MapGo framework achieves higher sample efficiency when compared to model-free baselines on a set of complicated tasks.

本文提出了一种名为FGI的新的重标记策略用于改善回报稀疏性问题，并通过引入动态模型来生成模拟轨迹来提高采样效率，提出了一种名为MapGo框架用于目标导向任务的模型辅助策略优化， 并在复杂任务上的实验证明了FGI策略相比后见策略的有效性，并且MapGo框架相对于无模型的基线表现出更高的采样效率。

MapGo: 面向目标任务的模型辅助策略优化