BriefGPT.xyz
Jun, 2016
生成对抗模仿学习
Generative Adversarial Imitation Learning
HTML
PDF
Jonathan Ho, Stefano Ermon
TL;DR
提出了一种提取专家行为策略的新框架,直接从数据中提取策略,将模仿学习与生成对抗网络进行比拟,提出了无模型模仿学习算法,并证明该算法在模仿大型、高维度环境中的复杂行为时相对于现有无模型模仿学习方法具有明显性能提升。
Abstract
Consider learning a policy from example
expert behavior
, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with
inverse reinforcement learning
→