BriefGPT.xyz
Jun, 2023
分解和修复:使用期权提高模仿学习对抗示范的性能
Divide and Repair: Using Options to Improve Performance of Imitation Learning Against Adversarial Demonstrations
HTML
PDF
Prithviraj Dasgupta
TL;DR
提出了一种新技术,可以识别对学习有用的演示轨迹的部分并利用它们进行学习,同时排除受对手修改的轨迹的部分,以提高学习效率,在预示轨迹遭到不同类型和程度的对手攻击的情况下,该算法有效防止了学习表现下降。
Abstract
We consider the problem of
learning
to perform a task from
demonstrations
given by teachers or experts, when some of the experts'
demonstrations<
→