Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to
spontaneously set diverse goals to learn a set of skills. Despite the excellent
works proposed in various fields, reaching distant goals in temporally extended
tasks remains a challenge for GCRL. Current works tackled this problem by
leveraging planning algorithms to plan intermediate subgoals to augment GCRL.
Their methods need two crucial requirements: (i) a state representation space
to search valid subgoals, and (ii) a distance function to measure the
reachability of subgoals. However, they struggle to scale to high-dimensional
state space due to their non-compact representations. Moreover, they cannot
collect high-quality training data through standard GC policies, which results
in an inaccurate distance function. Both affect the efficiency and performance
of planning and policy learning. In the paper, we propose a goal-conditioned RL
algorithm combined with Disentanglement-based Reachability Planning (REPlan) to
solve temporally extended tasks. In REPlan, a Disentangled Representation
Module (DRM) is proposed to learn compact representations which disentangle
robot poses and object positions from high-dimensional observations in a
self-supervised manner. A simple REachability discrimination Module (REM) is
also designed to determine the temporal distance of subgoals. Moreover, REM
computes intrinsic bonuses to encourage the collection of novel states for
training. We evaluate our REPlan in three vision-based simulation tasks and one
real-world task. The experiments demonstrate that our REPlan significantly
outperforms the prior state-of-the-art methods in solving temporally extended
tasks.

我们提出了一种基于目标条件的强化学习算法，结合了解缠绕的可达性规划（REPlan），用于解决时间延展任务，在模拟和真实世界任务中，REPlan 显著优于之前最先进的方法。

基于解离式可达性规划的目标驱动强化学习

Goal-Conditioned Reinforcement Learning with Disentanglement-based  Reachability Planning

While traditional methods for instruction-following typically assume prior
linguistic and perceptual knowledge, many recent works in reinforcement
learning (RL) have proposed learning policies end-to-end, typically by training
neural networks to map joint representations of observations and instructions
directly to actions. In this work, we present a novel framework for learning to
perform temporally extended tasks using spatial reasoning in the RL framework,
by sequentially imagining visual goals and choosing appropriate actions to
fulfill imagined goals. Our framework operates on raw pixel images, assumes no
prior linguistic or perceptual knowledge, and learns via intrinsic motivation
and a single extrinsic reward signal measuring task completion. We validate our
method in two environments with a robot arm in a simulated interactive 3D
environment. Our method outperforms two flat architectures with raw-pixel and
ground-truth states, and a hierarchical architecture with ground-truth states
on object arrangement tasks.

本文提出了一种基于空间推理和 RL 框架的学习方法，通过想象视觉目标并选择适当的行动来完成任务，使用单一外部奖励信号和内部动机来学习，该方法在两个仿真 3D 环境中，进行了验证，并在处理物体排列任务时，优于两个扁平化架构和一个分层架构。