训练交互式辅助工具

Jun, 2019

Training an Interactive Helper

Mark Woodward, Chelsea Finn, Karol Hausman

TL;DR本文提出了一种元学习策略，通过与一个名为“prime”代理互动，训练一个“helper”代理来最大化其奖励，而不观察其奖励或接收显式演示，并介绍了一些协作的觅食任务，通过物理交流，训练的helper代理可以快速推断和收集正确的对象。

Abstract

Developing agents that can quickly adapt their behavior to new tasks remains a challenge. meta-learning has been applied to this problem, but previous methods require either specifying a reward function which can be tedious or providing demonstrations which can be inefficient. In this