Goal representation affects the performance of Hierarchical Reinforcement
Learning (HRL) algorithms by decomposing the complex learning problem into
easier subtasks. Recent studies show that representations that preserve
temporally abstract environment dynamics are successful in solving difficult
problems and provide theoretical guarantees for optimality. These methods
however cannot scale to tasks where environment dynamics increase in complexity
i.e. the temporally abstract transition relations depend on larger number of
variables. On the other hand, other efforts have tried to use spatial
abstraction to mitigate the previous issues. Their limitations include
scalability to high dimensional environments and dependency on prior knowledge.
In this paper, we propose a novel three-layer HRL algorithm that introduces,
at different levels of the hierarchy, both a spatial and a temporal goal
abstraction. We provide a theoretical study of the regret bounds of the learned
policies. We evaluate the approach on complex continuous control tasks,
demonstrating the effectiveness of spatial and temporal abstractions learned by
this approach.

通过引入空间和时间目标抽象的三层层次强化学习（HRL）算法提高目标表示性能，评估了该算法在复杂连续控制任务上学习到的空间和时间抽象的有效性以及遗憾边界的理论研究。

协调空间和时间抽象以实现目标表征

Reconciling Spatial and Temporal Abstractions for Goal Representation

Open-ended learning benefits immensely from the use of symbolic methods for
goal representation as they offer ways to structure knowledge for efficient and
transferable learning. However, the existing Hierarchical Reinforcement
Learning (HRL) approaches relying on symbolic reasoning are often limited as
they require a manual goal representation. The challenge in autonomously
discovering a symbolic goal representation is that it must preserve critical
information, such as the environment dynamics. In this paper, we propose a
developmental mechanism for goal discovery via an emergent representation that
abstracts (i.e., groups together) sets of environment states that have similar
roles in the task. We introduce a Feudal HRL algorithm that concurrently learns
both the goal representation and a hierarchical policy. The algorithm uses
symbolic reachability analysis for neural networks to approximate the
transition relation among sets of states and to refine the goal representation.
We evaluate our approach on complex navigation tasks, showing the learned
representation is interpretable, transferrable and results in data efficient
learning.

我们提出了一种通过紧密的表示来发现目标表示的发展机制，该机制可以将具有类似任务角色的环境状态集合抽象（即，分组在一起）。我们引入了一种 Feudal HRL 算法，该算法同时学习目标表示和分层策略。该算法使用神经网络的符号可达性分析来近似状态集合之间的转变关系并细化目标表示。我们在复杂的导航任务上评估了我们的方法，结果表明所学到的表示是可解释的、可传递的，并且可以实现高效的学习。

层次强化学习中的目标空间抽象通过基于集合的可达性分析

Goal Space Abstraction in Hierarchical Reinforcement Learning via  Set-Based Reachability Analysis

Humans learn to master open-ended repertoires of skills by imagining and
practicing their own goals. This autotelic learning process, literally the
pursuit of self-generated (auto) goals (telos), becomes more and more
open-ended as the goals become more diverse, abstract and creative. The
resulting exploration of the space of possible skills is supported by an
inter-individual exploration: goal representations are culturally evolved and
transmitted across individuals, in particular using language. Current
artificial agents mostly rely on predefined goal representations corresponding
to goal spaces that are either bounded (e.g. list of instructions), or
unbounded (e.g. the space of possible visual inputs) but are rarely endowed
with the ability to reshape their goal representations, to form new
abstractions or to imagine creative goals. In this paper, we introduce a
language model augmented autotelic agent (LMA3) that leverages a pretrained
language model (LM) to support the representation, generation and learning of
diverse, abstract, human-relevant goals. The LM is used as an imperfect model
of human cultural transmission; an attempt to capture aspects of humans'
common-sense, intuitive physics and overall interests. Specifically, it
supports three key components of the autotelic architecture: 1)~a relabeler
that describes the goals achieved in the agent's trajectories, 2)~a goal
generator that suggests new high-level goals along with their decomposition
into subgoals the agent already masters, and 3)~reward functions for each of
these goals. Without relying on any hand-coded goal representations, reward
functions or curriculum, we show that LMA3 agents learn to master a large
diversity of skills in a task-agnostic text-based environment.

本研究介绍了一种使用预训练语言模型（LM）的语言模型增强的自我目标学习环境，它支持自动生成并学习具有多样性、抽象性、与人类相关的目标 —— 而非手动编码的目标表示、回报函数或课程，该系统可以在基于文本的任务无关环境中学习掌握各种广泛的技能。