Generalizable object manipulation skills are critical for intelligent and multi-functional robots to work in real-world complex scenes. Despite the recent progress in reinforcement learning, it is still very challenging to learn a generalizable manipulation policy that can handle a category of geometrically diverse articulated objects. In this work, we tackle this category-level object manipulation policy learning problem via imitation learning in a task-agnostic manner, where we assume no handcrafted dense rewards but only a terminal reward. Given this novel and challenging generalizable policy learning problem, we identify several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances. We then propose several general but critical techniques, including generative adversarial self-imitation learning from demonstrations, progressive growing of discriminator, and instance-balancing for expert buffer, that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks. Our experiments on ManiSkill benchmarks demonstrate a remarkable improvement on all tasks and our ablation studies further validate the contribution of each proposed technique.

本文通过模仿学习的方式解决在复杂情况下通过学习实现物体操作技能的问题，提出了一种可以应用于各项任务的无先验奖励的泛化策略学习方法，并通过几个关键技术，包括生成式对抗自我模仿学习、不断完善的判别器和平衡专家池中的实例，显著提高了分类水平操作策略学习的效率和泛化能力。实验结果在ManiSkill基准测试中都有明显改善。

通过生成对抗自我模仿学习从示范中学习类别级可搬移物体操作策略