BriefGPT.xyz
Jun, 2023
基于策略的样本高效观测模仿学习
Sample-Efficient On-Policy Imitation Learning from Observations
HTML
PDF
João A. Cândido Ramos, Lionel Blondé, Naoya Takeishi, Alexandros Kalousis
TL;DR
本文提出了SEILO,这是一种新颖的ILO的样本有效的on-policy算法,结合了标准的对抗性模仿学习和逆动力学建模,通过此方法能够使智能体从对手过程和行为克隆损失中接收反馈,我们实证表明,相比于其他现有的on-policy ILO和ILD方法,我们所提出的算法需要更少的与环境的交互才能实现专家绩效。
Abstract
imitation learning
from
demonstrations
(ILD) aims to alleviate numerous shortcomings of reinforcement learning through the use of
demonstrations<
→