Imitation learning enables autonomous agents to learn from human examples,
without the need for a reward signal. Still, if the provided dataset does not
encapsulate the task correctly, or when the task is too complex to be modeled,
such agents fail to reproduce the expert policy. We propose to recover from
these failures through online adaptation. Our approach combines the action
proposal coming from a pre-trained policy with relevant experience recorded by
an expert. The combination results in an adapted action that closely follows
the expert. Our experiments show that an adapted agent performs better than its
pure imitation learning counterpart. Notably, adapted agents can achieve
reasonable performance even when the base, non-adapted policy catastrophically
fails.

我们提出通过在线调整来弥补模仿学习中的失败，我们的方法将预训练策略的动作建议与专家记录的相关经验相结合，通过适应的行为更好地模仿专家策略，实验表明适应的智能体表现比纯模仿学习的对应体更好，特别是在基础策略灾难性失败时，适应的智能体仍然能够实现合理的性能。

增强模仿学习策略的在线适应性

Online Adaptation for Enhancing Imitation Learning Policies

In this paper, we propose the use of generative artificial intelligence (AI)
to improve zero-shot performance of a pre-trained policy by altering
observations during inference. Modern robotic systems, powered by advanced
neural networks, have demonstrated remarkable capabilities on pre-trained
tasks. However, generalizing and adapting to new objects and environments is
challenging, and fine-tuning visuomotor policies is time-consuming. To overcome
these issues we propose Robotic Policy Inference via Synthetic Observations
(ROSO). ROSO uses stable diffusion to pre-process a robot's observation of
novel objects during inference time to fit within its distribution of
observations of the pre-trained policies. This novel paradigm allows us to
transfer learned knowledge from known tasks to previously unseen scenarios,
enhancing the robot's adaptability without requiring lengthy fine-tuning. Our
experiments show that incorporating generative AI into robotic inference
significantly improves successful outcomes, finishing up to 57% of tasks
otherwise unsuccessful with the pre-trained policy.

我们提出使用生成人工智能（AI）来改变推理过程中的观察，以提高预训练策略的零样本性能，并通过稳定的扩散来预处理机器人对新对象的观察，从而在未经漫长微调的情况下提高机器人的适应能力。