Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pretrained skills are applied in goal exploration. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance. The source code is available at: https://github.com/GEAPS/GEAPS.

本文提出了一种新的学习目标，通过优化已实现和未来需要探索的目标的熵，以更高效地探索子目标选择基于GCRL，该方法可以显著提高现有技术的探索效率并改善或保持它们的表现。

利用预训练技能来拓展目标勘探，用于稀疏奖励长时间尺度的目标条件加强学习