Hierarchical reinforcement learning has been a compelling approach for
achieving goal directed behavior over long sequences of actions. However, it
has been challenging to implement in realistic or open-ended environments. A
main challenge has been to find the right space of sub-goals over which to
instantiate a hierarchy. We present a novel approach where we use data from
humans solving these tasks to softly supervise the goal space for a set of long
range tasks in a 3D embodied environment. In particular, we use unconstrained
natural language to parameterize this space. This has two advantages: first, it
is easy to generate this data from naive human participants; second, it is
flexible enough to represent a vast range of sub-goals in human-relevant tasks.
Our approach outperforms agents that clone expert behavior on these tasks, as
well as HRL from scratch without this supervised sub-goal space. Our work
presents a novel approach to combining human expert supervision with the
benefits and flexibility of reinforcement learning.

我们提出了一种新颖的方法，利用人类在 3D 实体环境中解决任务时使用的无约束自然语言数据，通过软约束目标空间，对一组长期任务进行层次强化学习，从而实现在现实或开放环境中实现目标导向行为的挑战。

自然语言子目标的层次强化学习

Hierarchical reinforcement learning with natural language subgoals

How to behave efficiently and flexibly is a central problem for understanding
biological agents and creating intelligent embodied AI. It has been well known
that behavior can be classified as two types: reward-maximizing habitual
behavior, which is fast while inflexible; and goal-directed behavior, which is
flexible while slow. Conventionally, habitual and goal-directed behaviors are
considered handled by two distinct systems in the brain. Here, we propose to
bridge the gap between the two behaviors, drawing on the principles of
variational Bayesian theory. We incorporate both behaviors in one framework by
introducing a Bayesian latent variable called "intention". The habitual
behavior is generated by using prior distribution of intention, which is
goal-less; and the goal-directed behavior is generated by the posterior
distribution of intention, which is conditioned on the goal. Building on this
idea, we present a novel Bayesian framework for modeling behaviors. Our
proposed framework enables skill sharing between the two kinds of behaviors,
and by leveraging the idea of predictive coding, it enables an agent to
seamlessly generalize from habitual to goal-directed behavior without requiring
additional training. The proposed framework suggests a fresh perspective for
cognitive science and embodied AI, highlighting the potential for greater
integration between habitual and goal-directed behaviors.

该研究提出了一种使用变分贝叶斯理论桥接惯常性和目标导向性行为相互作用的框架，通过引入用于生成习惯性行为的意图先验分布和用于生成目标导向性行为的意图后验分布的贝叶斯潜变量，实现了两种行为的技能共享，并且其能够使代理程序轻松地从习惯性行为泛化到目标导向性行为。

习惯和目标的协同作用：行为的变分贝叶斯框架

Habits and goals in synergy: a variational Bayesian framework for  behavior

One of the long-standing challenges in Artificial Intelligence for learning
goal-directed behavior is to build a single agent which can solve multiple
tasks. Recent progress in multi-task learning for goal-directed sequential
problems has been in the form of distillation based learning wherein a student
network learns from multiple task-specific expert networks by mimicking the
task-specific policies of the expert networks. While such approaches offer a
promising solution to the multi-task learning problem, they require supervision
from large expert networks which require extensive data and computation time
for training. In this work, we propose an efficient multi-task learning
framework which solves multiple goal-directed tasks in an on-line setup without
the need for expert supervision. Our work uses active learning principles to
achieve multi-task learning by sampling the harder tasks more than the easier
ones. We propose three distinct models under our active sampling framework. An
adaptive method with extremely competitive multi-tasking performance. A
UCB-based meta-learner which casts the problem of picking the next task to
train on as a multi-armed bandit problem. A meta-learning method that casts the
next-task picking problem as a full Reinforcement Learning problem and uses
actor critic methods for optimizing the multi-tasking performance directly. We
demonstrate results in the Atari 2600 domain on seven multi-tasking instances:
three 6-task instances, one 8-task instance, two 12-task instances and one
21-task instance.

提出了一个高效的多任务学习框架，该框架采用主动学习原则来解决多个目标导向任务的问题，通过对 7 个多任务实例进行测试，实现了竞争性的多任务表现。

主动采样学习多任务

Learning to Multi-Task by Active Sampling

This paper presents a computational model of how conversational participants
collaborate in order to make a referring action successful. The model is based
on the view of language as goal-directed behavior. We propose that the content
of a referring expression can be accounted for by the planning paradigm. Not
only does this approach allow the processes of building referring expressions
and identifying their referents to be captured by plan construction and plan
inference, it also allows us to account for how participants clarify a
referring expression by using meta-actions that reason about and manipulate the
plan derivation that corresponds to the referring expression. To account for
how clarification goals arise and how inferred clarification plans affect the
agent, we propose that the agents are in a certain state of mind, and that this
state includes an intention to achieve the goal of referring and a plan that
the agents are currently considering. It is this mental state that sanctions
the adoption of goals and the acceptance of inferred plans, and so acts as a
link between understanding and generation.

本文提出了一个计算模型，探讨了会话参与者如何协作以使指称行为成功，使用规划法建立指称表达式和确认其标识参照物的过程，并提出了如何使用元操作来澄清指称表达式及其成果以及计划如何影响行动代理的问题。