OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya...
TL;DR通过不对人类先验假设的依赖,我们可以通过非对称自我博弈的方法训练出只需稀疏奖励的 Bob,并且他可以从 Alice 的轨迹中学习,以实现目标发现和机器人操作的一体化控制。
Abstract
We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for →