Efficient exploration remains a challenging problem in reinforcement
learning, especially for those tasks where rewards from environments are
sparse. A commonly used approach for exploring such environments is to
introduce some "intrinsic" reward. In this work, we focus on model uncert
在多智能体强化学习领域,内在动机作为一种重要的探索工具已经出现。我们提出了一种动态奖励缩放方法,以应对神经网络统计近似器的有限表达能力所带来的挑战,并有效控制多次重复访问任务空间的现象,在 Google Research Football 和 StarCraft II 微管理任务等挑战性环境中展示了改进的性能,尤其是在稀疏奖励设置下。