BriefGPT.xyz
Feb, 2021
基于观测的离策略模仿学习
Off-Policy Imitation Learning from Observations
HTML
PDF
Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou
TL;DR
本文提出了一个基于观察学习的学习方法,包括分布匹配、离线策略学习和倒置动作模型,能够在性能和样本效率上与最先进的方法相媲美。
Abstract
learning from observations
(LfO) is a practical
reinforcement learning
scenario from which many applications can benefit through the reuse of incomplete resources. Compared to conventional imitation learning (IL)
→