Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself. We also point out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty. This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple. The skills include behaviors such as running, blocking, ducking, tackling, fooling opponents, kicking, and defending using both arms and legs. A highlight of the learned behaviors can be found here: https://goo.gl/eR7fbX

本文介绍了几种在三维环境中进行竞争多智能体自我对抗训练的方法，这些方法可以训练出丰富多彩、技能复杂的智能体。此外，我们指出，自我对抗训练可以产生超出环境本身复杂性的行为，并且自带课程设置，有助于智能体学习不同难度水平下的技能。

多智能体竞争引发的紧急复杂性