BriefGPT.xyz
Apr, 2022
模仿,快与慢:通过决策时规划的演示鲁棒学习
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning
HTML
PDF
Carl Qi, Pieter Abbeel, Aditya Grover
TL;DR
提出了一种新的模拟学习元算法 IMPLANT,利用决策时间规划来纠正模仿策略的复合误差,从而实现比基准模仿学习方法更好的实验效果,在挑战性测试时动态运行。
Abstract
The goal of
imitation learning
is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via
inverse reinforcement lea
→