BriefGPT.xyz
Jun, 2021
通过离线数据缓解模仿学习中的协变量漂移
Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
HTML
PDF
Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
TL;DR
本研究基于静态离线数据,提出了 MILO 框架及算法,用于高效解决无需在线交互式的模仿学习问题,其能够成功应对较弱行为准则下的状态行为的偏移问题,最终成功模仿高水平行为准则的动作。
Abstract
This paper studies
offline imitation learning
(IL) where an agent learns to imitate an
expert demonstrator
without additional online environment interactions. Instead, the learner is presented with a static offli
→