Offline Imitation Learning (IL) with imperfect demonstrations has garnered
increasing attention owing to the scarcity of expert data in many real-world
domains. A fundamental problem in this scenario is how to extract positive
behaviors from noisy data. In general, current approaches to the problem select
data building on state-action similarity to given expert demonstrations,
neglecting precious information in (potentially abundant) $\textit{diverse}$
state-actions that deviate from expert ones. In this paper, we introduce a
simple yet effective data selection method that identifies positive behaviors
based on their resultant states -- a more informative criterion enabling
explicit utilization of dynamics information and effective extraction of both
expert and beneficial diverse behaviors. Further, we devise a lightweight
behavior cloning algorithm capable of leveraging the expert and selected data
correctly. In the experiments, we evaluate our method on a suite of complex and
high-dimensional offline IL benchmarks, including continuous-control and
vision-based tasks. The results demonstrate that our method achieves
state-of-the-art performance, outperforming existing methods on
$\textbf{20/21}$ benchmarks, typically by $\textbf{2-5x}$, while maintaining a
comparable runtime to Behavior Cloning ($\texttt{BC}$).

离线仿真学习（IL）在实际领域中由于专家数据的稀缺性而受到越来越多的关注。本文介绍了一种简单而有效的数据选择方法，基于其结果状态识别积极行为，从而更好地利用动态信息并有效地提取专家行为和有益的多样行为。通过在复杂和高维离线 IL 基准测试中的实验评估，结果表明我们的方法达到了最先进的性能，在 20/21 个基准测试中超越了现有方法，通常是 2-5 倍，并且与行为克隆（BC）保持可比的运行时间。

如何在离线模仿学习中利用多样化的示范

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Imitation learning has emerged as a promising approach for addressing
sequential decision-making problems, with the assumption that expert
demonstrations are optimal. However, in real-world scenarios, expert
demonstrations are often imperfect, leading to challenges in effectively
applying imitation learning. While existing research has focused on optimizing
with imperfect demonstrations, the training typically requires a certain
proportion of optimal demonstrations to guarantee performance. To tackle these
problems, we propose to purify the potential perturbations in imperfect
demonstrations and subsequently conduct imitation learning from purified
demonstrations. Motivated by the success of diffusion models, we introduce a
two-step purification via the diffusion process. In the first step, we apply a
forward diffusion process to effectively smooth out the potential perturbations
in imperfect demonstrations by introducing additional noise. Subsequently, a
reverse generative process is utilized to recover the optimal expert
demonstrations from the diffused ones. We provide theoretical evidence
supporting our approach, demonstrating that total variance distance between the
purified and optimal demonstration distributions can be upper-bounded. The
evaluation results on MuJoCo demonstrate the effectiveness of our method from
different aspects.

基于扩散过程的两步纯化方法，通过引入噪声消除了不完美展示中的潜在扰动，并从扩散后的数据中恢复出最优的专家展示，评估结果表明方法的有效性.