Despite single agent deep reinforcement learning has achieved significant success due to the experience replay mechanism, Concerns should be reconsidered in multiagent environments. This work focus on the stochastic cooperative environment. We apply a specific adaptation to one recently proposed weighted double estimator and propose a multiagent deep reinforcement learning framework, named Weighted Double Deep Q-Network (WDDQN). To achieve efficient cooperation, \textit{Lenient Reward Network} and \textit{Mixture Replay Strategy} are introduced. By utilizing the deep neural network and the weighted double estimator, WDDQN can not only reduce the bias effectively but also be extended to many deep RL scenarios with only raw pixel images as input. Empirically, the WDDQN outperforms the existing DRL algorithm (double DQN) and multiagent RL algorithm (lenient Q-learning) in terms of performance and convergence within stochastic cooperative environments.

本文提出了一种名为WDDQN的多智能体深度强化学习框架，通过利用加权双估计器和深度神经网络，在具有原始视觉输入的场景下有效地减少偏差，并引入宽以待人的奖励网络和调度重放策略以实现多智能体领域的有效合作，实验证明 WDDQN 在随机合作环境中在平均奖励和收敛速度方面优于现有的DRL和多智能体DRL算法，即双重DQN和宽以待人的Q-learning。

随机协同环境中的加权双重深度多智能体强化学习