Learning a diverse set of skills by interacting with an environment without
any external supervision is an important challenge. In particular, obtaining a
goal-conditioned agent that can reach any given state is useful in many
applications. We propose a novel method for training such a goal-conditioned
agent without any external rewards or any domain knowledge. We use random walk
to train a reachability network that predicts the similarity between two
states. This reachability network is then used in building goal memory
containing past observations that are diverse and well-balanced. Finally, we
train a goal-conditioned policy network with goals sampled from the goal memory
and reward it by the reachability network and the goal memory. All the
components are kept updated throughout training as the agent discovers and
learns new goals. We apply our method to a continuous control navigation and
robotic manipulation tasks.

提出一种新颖的方法，使用随机漫步训练到达网络以预测环境中两个状态的相似性，并使用得到的到达网络构建目标存储器，最终训练了一个有能力到达任何给定状态的目标条件代理，应用于连续控制导航和机器人控制任务。