In offline Imitation Learning (IL), an agent aims to learn an optimal expert
behavior policy without additional online environment interactions. However, in
many real-world scenarios, such as robotics manipulation, the offline dataset
is collected from suboptimal behaviors without rewards. Due to the scarce
expert data, the agents usually suffer from simply memorizing poor trajectories
and are vulnerable to the variations in the environments, lacking the
capability of generalizing to new environments. To effectively remove spurious
features that would otherwise bias the agent and hinder generalization, we
propose a framework named \underline{O}ffline \underline{I}mitation
\underline{L}earning with \underline{C}ounterfactual data
\underline{A}ugmentation (OILCA). In particular, we leverage the identifiable
variational autoencoder to generate \textit{counterfactual} samples. We
theoretically analyze the counterfactual identification and the improvement of
generalization. Moreover, we conduct extensive experiments to demonstrate that
our approach significantly outperforms various baselines on both
\textsc{DeepMind Control Suite} benchmark for in-distribution robustness and
\textsc{CausalWorld} benchmark for out-of-distribution generalization.

离线模仿学习中，通过使用计反事实数据增强方法，本研究通过对抗性生成反事实样本来有效消除困扰智能体泛化能力的误导特征，从而解决了专家数据稀缺、仅能记忆差劣轨迹以及环境变化引起的问题。实验结果表明，该方法在内部分布稳健性和外部分布泛化能力方面显著优于其他基线模型。