This paper studies Learning from Observations (LfO) for imitation learning
with access to state-only demonstrations. In contrast to Learning from
Demonstration (LfD) that involves both action and state supervision, LfO is
more practical in leveraging previously inapplicable resources (e.g. videos),
yet more challenging due to the incomplete expert guidance. In this paper, we
investigate LfO and its difference with LfD in both theoretical and practical
perspectives. We first prove that the gap between LfD and LfO actually lies in
the disagreement of inverse dynamics models between the imitator and the
expert, if following the modeling approach of GAIL. More importantly, the upper
bound of this gap is revealed by a negative causal entropy which can be
minimized in a model-free way. We term our method as
Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the
conventional LfO method through further bridging the gap to LfD. Considerable
empirical results on challenging benchmarks indicate that our method attains
consistent improvements over other LfO counterparts.

本文研究了利用只有状态的演示进行模仿学习的观察学习（LfO）。通过理论和实践的角度，我们首先证明了如果遵循 GAIL 的建模方法，LfD 和 LfO 之间的差距实际上在于模仿者和专家之间的逆动力学模型的分歧。我们提出了 Inverse-Dynamics-Disagreement-Minimization（IDDM）方法，通过进一步缩小与 LfD 之间的差距来增强传统的 LfO 方法。挑战性基准测试的实证结果表明，我们的方法相对于其他 LfO 方法获得了一致的改进。

通过最小化逆动力学不一致性从观察中进行模仿学习

Imitation Learning from Observations by Minimizing Inverse Dynamics  Disagreement

Imitation from observation (IfO) is the problem of learning directly from
state-only demonstrations without having access to the demonstrator's actions.
The lack of action information both distinguishes IfO from most of the
literature in imitation learning, and also sets it apart as a method that may
enable agents to learn from a large set of previously inapplicable resources
such as internet videos. In this paper, we propose both a general framework for
IfO approaches and also a new IfO approach based on generative adversarial
networks called generative adversarial imitation from observation (GAIfO). We
conduct experiments in two different settings: (1) when demonstrations consist
of low-dimensional, manually-defined state features, and (2) when
demonstrations consist of high-dimensional, raw visual data. We demonstrate
that our approach performs comparably to classical imitation learning
approaches (which have access to the demonstrator's actions) and significantly
outperforms existing imitation from observation methods in high-dimensional
simulation environments.

本文提出了一种基于生成对抗网络的从观察中模仿学习方法（GAIfO），它可以在没有行动信息的情况下直接从状态演示中学习，进行了两种不同设置的实验证明它在高维模拟环境中优于现有的直接从状态演示方法。