In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation policy from static demonstration data, followed by fast finetuning with minimal environmental interaction. We find the na\"ive combination of existing offline IL and online IL methods tends to behave poorly in this context, because the initial discriminator (often used in online IL) operates randomly and discordantly against the policy initialization, leading to misguided policy optimization and $\textit{unlearning}$ of pretraining knowledge. To overcome this challenge, we propose a principled offline-to-online IL method, named $\texttt{OLLIE}$, that simultaneously learns a near-expert policy initialization along with an $\textit{aligned discriminator initialization}$, which can be seamlessly integrated into online IL, achieving smooth and fast finetuning. Empirically, $\texttt{OLLIE}$ consistently and significantly outperforms the baseline methods in $\textbf{20}$ challenging tasks, from continuous control to vision-based domains, in terms of performance, demonstration efficiency, and convergence speed. This work may serve as a foundation for further exploration of pretraining and finetuning in the context of IL.

这篇论文研究了离线到在线模仿学习（IL），该方法从静态示范数据中预训练一个模仿策略，然后通过最小的环境交互快速微调。通过研究发现现有的离线IL和在线IL方法的原始组合在这个情景下表现不佳，因为初始鉴别器（通常在在线IL中使用）随机运作和不一致地反对策略初始化，导致了策略优化的误导和预训练知识的遗忘。为了克服这个挑战，提出了一种有原则的离线到在线IL方法，称为OLLIE，它同时学习了接近专家策略初始化和对齐的鉴别器初始化，可以无缝地集成到在线IL中，实现平稳快速的微调。经验上，在连续控制到视觉领域的20个具有挑战性的任务中，OLLIE在性能、示范效率和收敛速度方面始终显著优于基线方法。该工作可能为进一步探索模仿学习中的预训练和微调奠定基础。

OLLIE: 离线预训练到在线微调的模仿学习