BriefGPT.xyz
Jun, 2024
基于离线数据的观测模仿学习的双重方法
A Dual Approach to Imitation Learning from Observations with Offline Datasets
HTML
PDF
Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum
TL;DR
通过学习一个多步效用函数来量化每个行动对智能体与专家的访问分布之间的差异,我们提出了DILO(从观测中进行双重模仿学习)算法,它可以利用任意次优数据学习模仿策略而不需要专家行动,从而有效地解决了高维观测问题,表现得更好。
Abstract
demonstrations
are an effective alternative to task specification for
learning agents
in settings where designing a reward function is difficult. However, demonstrating expert behavior in the action space of the
→