The full potential of large pretrained models remains largely untapped in
control domains like robotics. This is mainly because of the scarcity of data
and the computational challenges associated with training or fine-tuning these
large models for such applications. Prior work mainly emphasizes effective
pretraining of large models for decision-making, with little exploration into
how to perform data-efficient continual adaptation of these models for new
tasks. Recognizing these constraints, we introduce TAIL (Task-specific Adapters
for Imitation Learning), a framework for efficient adaptation to new control
tasks. Inspired by recent advancements in parameter-efficient fine-tuning in
language domains, we explore efficient fine-tuning techniques -- e.g.,
Bottleneck Adapters, P-Tuning, and Low-Rank Adaptation (LoRA) -- in TAIL to
adapt large pretrained models for new tasks with limited demonstration data.
Our extensive experiments in large-scale language-conditioned manipulation
tasks comparing prevalent parameter-efficient fine-tuning techniques and
adaptation baselines suggest that TAIL with LoRA can achieve the best
post-adaptation performance with only 1\% of the trainable parameters of full
fine-tuning, while avoiding catastrophic forgetting and preserving adaptation
plasticity in continual learning settings.

TAIL 框架通过使用 LoRA 技术实现对大型预训练模型的高效适应，其在新任务中只使用了 1% 的可训练参数，避免了灾难性遗忘并保持了持续学习环境中的适应能力。

TAIL：大型预训练模型的任务专用适配器用于模仿学习

TAIL: Task-specific Adapters for Imitation Learning with Large  Pretrained Models

Quality-Diversity is a branch of stochastic optimization that is often
applied to problems from the Reinforcement Learning and control domains in
order to construct repertoires of well-performing policies/skills that exhibit
diversity with respect to a behavior space. Such archives are usually composed
of a finite number of reactive agents which are each associated to a unique
behavior descriptor, and instantiating behavior descriptors outside of that
coarsely discretized space is not straight-forward. While a few recent works
suggest solutions to that issue, the trajectory that is generated is not easily
customizable beyond the specification of a target behavior descriptor. We
propose to jointly solve those problems in environments where semantic
information about static scene elements is available by leveraging a Large
Language Model to augment the repertoire with natural language descriptions of
trajectories, and training a policy conditioned on those descriptions. Thus,
our method allows a user to not only specify an arbitrary target behavior
descriptor, but also provide the model with a high-level textual prompt to
shape the generated trajectory. We also propose an LLM-based approach to
evaluating the performance of such generative agents. Furthermore, we develop a
benchmark based on simulated robot navigation in a 2d maze that we use for
experimental validation.

通过使用大型语言模型扩充轨迹的自然语言描述，我们提出了一种方法来解决质量 - 多样性问题，允许用户指定任意目标行为描述，并通过高层次的文本提示来塑造生成的轨迹。同时，我们还提出了一种基于大型语言模型的评估方法，并使用 2D 迷宫中的模拟机器人导航开发了一种基准测试用于实验验证。