The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone. However, current state-of-the-art approaches developed for this problem exhibit high sample complexity due, in part, to the high-dimensional nature of video observations. Towards addressing this issue, we introduce here a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO. At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer, which provides estimates of lower-dimensional proprioceptive state representations from high-dimensional images. We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations and can sometimes even achieve performance close to the Generative Adversarial Imitation from Observation (GAIfO) algorithm that has privileged access to the demonstrator's proprioceptive state information.

介绍了一种名为 Visual Generative Adversarial Imitation from Observation using a State Observer(VGAIfO-SO) 的新算法，该算法使用自我监督的状态观察器从高维图像中提供低维本体感知状态表示的估计，从而更有效地从仅视频演示中学习，并且有时可以实现接近于有特权访问演示者本体感知状态信息的 GAIfO 算法的性能。

使用状态观察器从视频中进行对抗性模仿学习