We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines.

通过引入具有连续模拟物理的具有挑战性的竞争性多智能体足球环境，我们研究了加强学习智能体中合作行为的出现。我们演示了分散、基于人口的联合训练能够导致代理行为的进步：从随机的行为到简单的球追逐，最终呈现出合作的迹象。我们进一步应用了一个由博弈论原理支持的评估方案，可以在没有预定义评估任务或人类基准的情况下评估代理的性能。

竞争中的协同涌现