Goal-conditioned reinforcement learning (RL) is a promising direction for
training agents that are capable of solving multiple tasks and reach a diverse
set of objectives. How to \textit{specify} and \textit{ground} these goals in
such a way that we can both reliably reach goals during training as well as
generalize to new goals during evaluation remains an open area of research.
Defining goals in the space of noisy and high-dimensional sensory inputs poses
a challenge for training goal-conditioned agents, or even for generalization to
novel goals. We propose to address this by learning factorial representations
of goals and processing the resulting representation via a discretization
bottleneck, for coarser goal specification, through an approach we call DGRL.
We show that applying a discretizing bottleneck can improve performance in
goal-conditioned RL setups, by experimentally evaluating this method on tasks
ranging from maze environments to complex robotic navigation and manipulation.
Additionally, we prove a theorem lower-bounding the expected return on
out-of-distribution goals, while still allowing for specifying goals with
expressive combinatorial structure.

提出了一种称为 DGRL 的方法，该方法通过学习目标的阶乘表示，并通过离散化瓶颈进行处理，以更粗略的目标规范来解决在噪声和高维度输入空间中定义目标的挑战；实验证明应用离散化瓶颈可以提高目标条件下的 RL 设置的性能。

离散阶乘表示作为目标条件强化学习的抽象

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

With the resurgence of interest in neural networks, representation learning
has re-emerged as a central focus in artificial intelligence. Representation
learning refers to the discovery of useful encodings of data that make
domain-relevant information explicit. Factorial representations identify
underlying independent causal factors of variation in data. A factorial
representation is compact and faithful, makes the causal factors explicit, and
facilitates human interpretation of data. Factorial representations support a
variety of applications, including the generation of novel examples, indexing
and search, novelty detection, and transfer learning.
This article surveys various constraints that encourage a learning algorithm
to discover factorial representations. I dichotomize the constraints in terms
of unsupervised and supervised inductive bias. Unsupervised inductive biases
exploit assumptions about the environment, such as the statistical distribution
of factor coefficients, assumptions about the perturbations a factor should be
invariant to (e.g. a representation of an object can be invariant to rotation,
translation or scaling), and assumptions about how factors are combined to
synthesize an observation. Supervised inductive biases are constraints on the
representations based on additional information connected to observations.
Supervisory labels come in variety of types, which vary in how strongly they
constrain the representation, how many factors are labeled, how many
observations are labeled, and whether or not we know the associations between
the constraints and the factors they are related to.
This survey brings together a wide variety of models that all touch on the
problem of learning factorial representations and lays out a framework for
comparing these models based on the strengths of the underlying supervised and
unsupervised inductive biases.

本综述介绍了各种鼓励学习算法发现阶乘表示的约束条件，并将其分为无监督和监督归纳偏置两类。同时，本文呈现了一种基于监督和无监督归纳偏置优势比较的框架，比较了各种与学习阶乘表示问题相关的模型。