Assistant AI agents should be capable of rapidly acquiring novel skills and
adapting to new user preferences. Traditional frameworks like imitation
learning and reinforcement learning do not facilitate this capability because
they support only low-level, inefficient forms of communication. In contrast,
humans communicate with progressive efficiency by defining and sharing abstract
intentions. Reproducing similar capability in AI agents, we develop a novel
learning framework named Communication-Efficient Interactive Learning (CEIL).
By equipping a learning agent with an abstract, dynamic language and an
intrinsic motivation to learn with minimal communication effort, CEIL leads to
emergence of a human-like pattern where the learner and the teacher communicate
progressively efficiently by exchanging increasingly more abstract intentions.
CEIL demonstrates impressive performance and communication efficiency on a 2D
MineCraft domain featuring long-horizon decision-making tasks. Agents trained
with CEIL quickly master new tasks, outperforming non-hierarchical and
hierarchical imitation learning by up to 50% and 20% in absolute success rate,
respectively, given the same number of interactions with the teacher.
Especially, the framework performs robustly with teachers modeled after human
pragmatic communication behavior.

通过使用名为通信效率交互学习（CEIL）的新学习框架，将人类的渐进式高效沟通方式复制到 AI 代理中，以抽象、动态的语言装备学习代理，并在最小化通信的同时激励学习，实现了人类模式的出现，使学习者和教师通过交换越来越抽象的意图逐渐高效地沟通，该框架在 2D MineCraft 领域的决策性任务中表现出了令人印象深刻的性能和沟通效率，与同样数量的教师互动相比，使用 CEIL 训练的代理快速掌握新任务，绝对成功率比非分层和分层模仿学习分别提高了 50% 和 20%，特别是在以人类实用沟通行为为模型的教师模型中表现出了稳健性。

渐进高效学习

Progressively Efficient Learning

The ability to plan actions on multiple levels of abstraction enables
intelligent agents to solve complex tasks effectively. However, learning the
models for both low and high-level planning from demonstrations has proven
challenging, especially with higher-dimensional inputs. To address this issue,
we propose to use reinforcement learning to identify subgoals in expert
trajectories by associating the magnitude of the rewards with the
predictability of low-level actions given the state and the chosen subgoal. We
build a vector-quantized generative model for the identified subgoals to
perform subgoal-level planning. In experiments, the algorithm excels at solving
complex, long-horizon decision-making problems outperforming state-of-the-art.
Because of its ability to plan, our algorithm can find better trajectories than
the ones in the training set

本文提出使用强化学习来识别专家轨迹中的子目标，从而构建一个向量量化生成模型，以进行子目标级别的规划，并在复杂的长期决策问题上表现出色，优于现有技术。

基于向量量化模型的分层仿真学习

Hierarchical Imitation Learning with Vector Quantized Models

Model-based reinforcement learning methods often use learning only for the
purpose of estimating an approximate dynamics model, offloading the rest of the
decision-making work to classical trajectory optimizers. While conceptually
simple, this combination has a number of empirical shortcomings, suggesting
that learned models may not be well-suited to standard trajectory optimization.
In this paper, we consider what it would look like to fold as much of the
trajectory optimization pipeline as possible into the modeling problem, such
that sampling from the model and planning with it become nearly identical. The
core of our technical approach lies in a diffusion probabilistic model that
plans by iteratively denoising trajectories. We show how classifier-guided
sampling and image inpainting can be reinterpreted as coherent planning
strategies, explore the unusual and useful properties of diffusion-based
planning methods, and demonstrate the effectiveness of our framework in control
settings that emphasize long-horizon decision-making and test-time flexibility.

本文通过扩展动力学模型，利用扩散概率模型去掉了传统轨迹优化方法的瓶颈，将采样和计划步骤近乎完全融合，通过分类器和图像插值获得了在线规划策略，并在长期决策和测试时间灵活性强的控制环境中成功应用了该框架。