Existing imitation learning (IL) methods such as inverse reinforcement
learning (IRL) usually have a double-loop training process, alternating between
learning a reward function and a policy and tend to suffer long training time
and high variance. In this work, we identify the benefits of differentiable
physics simulators and propose a new IL method, i.e., Imitation Learning via
Differentiable Physics (ILD), which gets rid of the double-loop design and
achieves significant improvements in final performance, convergence speed, and
stability. The proposed ILD incorporates the differentiable physics simulator
as a physics prior into its computational graph for policy learning. It unrolls
the dynamics by sampling actions from a parameterized policy, simply minimizing
the distance between the expert trajectory and the agent trajectory, and
back-propagating the gradient into the policy via temporal physics operators.
With the physics prior, ILD policies can not only be transferable to unseen
environment specifications but also yield higher final performance on a variety
of tasks. In addition, ILD naturally forms a single-loop structure, which
significantly improves the stability and training speed. To simplify the
complex optimization landscape induced by temporal physics operations, ILD
dynamically selects the learning objectives for each state during optimization.
In our experiments, we show that ILD outperforms state-of-the-art methods in a
variety of continuous control tasks with Brax, requiring only one expert
demonstration. In addition, ILD can be applied to challenging deformable object
manipulation tasks and can be generalized to unseen configurations.

本文提出了一种新的基于可微分物理仿真器的模仿学习方法 (ILD)，该方法将物理预设作为先验加入到计算图中进行策略学习，在优化过程中动态选择每个状态的学习目标，实现了单循环结构，提高了稳定性和训练速度。在验证中，ILD 在连续控制任务和变形物体操作任务中表现优异，且只需要一次专家演示。