In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration. While the computation of many intrinsic rewards relies on estimating variational posteriors using neural network approximators, a notable challenge has surfaced due to the limited expressive capability of these neural statistics approximators. We pinpoint this challenge as the "revisitation" issue, where agents recurrently explore confined areas of the task space. To combat this, we propose a dynamic reward scaling approach. This method is crafted to stabilize the significant fluctuations in intrinsic rewards in previously explored areas and promote broader exploration, effectively curbing the revisitation phenomenon. Our experimental findings underscore the efficacy of our approach, showcasing enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks, especially in sparse reward settings.

在多智能体强化学习领域，内在动机作为一种重要的探索工具已经出现。我们提出了一种动态奖励缩放方法，以应对神经网络统计近似器的有限表达能力所带来的挑战，并有效控制多次重复访问任务空间的现象，在Google Research Football和StarCraft II微管理任务等挑战性环境中展示了改进的性能，尤其是在稀疏奖励设置下。

多智能体强化学习中避免重复探索