We consider offline Imitation Learning from corrupted demonstrations where a
constant fraction of data can be noise or even arbitrary outliers. Classical
approaches such as Behavior Cloning assumes that demonstrations are collected
by an presumably optimal expert, hence may fail drastically when learning from
corrupted demonstrations. We propose a novel robust algorithm by minimizing a
Median-of-Means (MOM) objective which guarantees the accurate estimation of
policy, even in the presence of constant fraction of outliers. Our theoretical
analysis shows that our robust method in the corrupted setting enjoys nearly
the same error scaling and sample complexity guarantees as the classical
Behavior Cloning in the expert demonstration setting. Our experiments on
continuous-control benchmarks validate that our method exhibits the predicted
robustness and effectiveness, and achieves competitive results compared to
existing imitation learning methods.

本文提出解决离线模仿学习中存在数据噪音或离群点的问题的新型算法，通过最小化 Median-of-Means 目标函数对策略进行准确估计，实现精确的离群点估计和稳健性，同时实验证明在存在异常数据的情况下与传统 Behavior Cloning 算法相比有相同的误差和样本复杂性保证。