In goal-conditioned hierarchical reinforcement learning (HRL), a high-level
policy specifies a subgoal for the low-level policy to reach. Effective HRL
hinges on a suitable subgoal represen tation function, abstracting state space
into latent subgoal space and inducing varied low-level behaviors. Existing
methods adopt a subgoal representation that provides a deterministic mapping
from state space to latent subgoal space. Instead, this paper utilizes Gaussian
Processes (GPs) for the first probabilistic subgoal representation. Our method
employs a GP prior on the latent subgoal space to learn a posterior
distribution over the subgoal representation functions while exploiting the
long-range correlation in the state space through learnable kernels. This
enables an adaptive memory that integrates long-range subgoal information from
prior planning steps allowing to cope with stochastic uncertainties.
Furthermore, we propose a novel learning objective to facilitate the
simultaneous learning of probabilistic subgoal representations and policies
within a unified framework. In experiments, our approach outperforms
state-of-the-art baselines in standard benchmarks but also in environments with
stochastic elements and under diverse reward conditions. Additionally, our
model shows promising capabilities in transferring low-level policies across
different tasks.

高层策略给出低层策略达到的子目标，在这篇论文中，我们提出了一种基于高斯过程的概率子目标表示方法，通过可学习的核函数利用状态空间的长程相关性来学习先验规划步骤中的长程子目标信息，从而适应不确定性。同时，我们还提出了一种新的学习目标，以实现概率子目标表示和策略的同时学习。实验结果表明，我们的方法在标准基准测试和具有随机因素和多样化奖励条件的环境中优于最先进的基准，并且我们的模型在不同任务之间转移低层策略具有良好的性能。