生成对抗模仿学习

Jun, 2016

Generative Adversarial Imitation Learning

Jonathan Ho, Stefano Ermon

TL;DR提出了一种提取专家行为策略的新框架，直接从数据中提取策略，将模仿学习与生成对抗网络进行比拟，提出了无模型模仿学习算法，并证明该算法在模仿大型、高维度环境中的复杂行为时相对于现有无模型模仿学习方法具有明显性能提升。

Abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning</