This work extends an existing virtual multi-agent platform called RoboSumo to create TripleSumo -- a platform for investigating multi-agent cooperative behaviors in continuous action spaces, with physical contact in an adversarial environment. In this paper we investigate a scenario in which two agents, namely `Bug' and `Ant', must team up and push another agent `Spider' out of the arena. To tackle this goal, the newly added agent `Bug' is trained during an ongoing match between `Ant' and `Spider'. `Bug' must develop awareness of the other agents' actions, infer the strategy of both sides, and eventually learn an action policy to cooperate. The reinforcement learning algorithm Deep Deterministic Policy Gradient (DDPG) is implemented with a hybrid reward structure combining dense and sparse rewards. The cooperative behavior is quantitatively evaluated by the mean probability of winning the match and mean number of steps needed to win.

本文介绍了一个基于虚拟多智能体平台的扩展，称为TripleSumo，用于研究连续动作空间中多智能体的合作行为，在对抗性环境中进行物理接触。我们对两个代理Bug和Ant与Spider进行合作的情景进行了研究，并首次提出加入Bug的强化学习算法DDPG，通过混合奖励结构定量评估了合作行为。

在对抗多智体系统中学习合作行为