BriefGPT.xyz
Jun, 2020
通过生成模型的内在奖励驱动的模仿学习
Intrinsic Reward Driven Imitation Learning via Generative Model
HTML
PDF
Xingrui Yu, Yueming Lyu, Ivor W. Tsang
TL;DR
通过引入一种新的奖励学习模块,可通过生成模型生成内在奖励信号。我们的生成功能可以更好地执行前向状态转换和后向动作编码,提高模块在环境中的动力学建模能力,并为模仿代理提供了模仿者的内在意图和更好的探索能力。经验证明,我们的模型在多个Atari游戏中的表现优于现有的IRL方法,即使只有一次演示,性能也是演示的5倍。
Abstract
imitation learning
in a high-dimensional environment is challenging. Most
inverse reinforcement learning
(IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domai
→