Lifelong learning offers a promising paradigm of building a generalist agent
that learns and adapts over its lifespan. Unlike traditional lifelong learning
problems in image and text domains, which primarily involve the transfer of
declarative knowledge of entities and concepts, lifelong learning in
decision-making (LLDM) also necessitates the transfer of procedural knowledge,
such as actions and behaviors. To advance research in LLDM, we introduce
LIBERO, a novel benchmark of lifelong learning for robot manipulation.
Specifically, LIBERO highlights five key research topics in LLDM: 1) how to
efficiently transfer declarative knowledge, procedural knowledge, or the
mixture of both; 2) how to design effective policy architectures and 3)
effective algorithms for LLDM; 4) the robustness of a lifelong learner with
respect to task ordering; and 5) the effect of model pretraining for LLDM. We
develop an extendible procedural generation pipeline that can in principle
generate infinitely many tasks. For benchmarking purpose, we create four task
suites (130 tasks in total) that we use to investigate the above-mentioned
research topics. To support sample-efficient learning, we provide high-quality
human-teleoperated demonstration data for all tasks. Our extensive experiments
present several insightful or even unexpected discoveries: sequential
finetuning outperforms existing lifelong learning methods in forward transfer,
no single visual encoder architecture excels at all types of knowledge
transfer, and naive supervised pretraining can hinder agents' performance in
the subsequent LLDM. Check the website at this https URL for
the code and the datasets.

LIBERO 是一个新的机器人操作的终身学习基准，提出了五个核心研究主题: （1）如何有效地传递申明性知识、程序性知识或两者混合；（2）如何设计有效的政策架构和决策算法；（3）终身学习与任务排序的关系；（4）模型预训练对终身学习的影响；（5）如何在终身学习中进行 Sample efficient learning

LIBERO：针对终身机器人学习的知识转移基准测试

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

This paper investigates the idea of encoding object-centered representations
in the design of the reward function and policy architectures of a
language-guided reinforcement learning agent. This is done using a combination
of object-wise permutation invariant networks inspired from Deep Sets and
gated-attention mechanisms. In a 2D procedurally-generated world where agents
targeting goals in natural language navigate and interact with objects, we show
that these architectures demonstrate strong generalization capacities to
out-of-distribution goals. We study the generalization to varying numbers of
objects at test time and further extend the object-centered architectures to
goals involving relational reasoning.

本文研究了在自然语言引导下的强化学习中，将以对象为中心的表现编码到奖励函数和策略架构中的想法。通过使用受深度集合启发的对象排列不变网络和门控注意机制的组合，我们在二维过程生成的世界中显示出这些结构对于分布外的目标具有强大的泛化能力，同时我们研究了在测试时对象数量的泛化和将以对象为中心的架构扩展到涉及关系推理的目标。