Intrinsically motivated reinforcement learning aims to address the exploration challenge for sparse-reward tasks. However, the study of exploration methods in transition-dependent multi-agent settings is largely absent from the literature. We aim to take a step towards solving this problem. We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents. EITI uses mutual information to capture influence transition dynamics. EDTI uses a novel intrinsic reward, called Value of Interaction (VoI), to characterize and quantify the influence of one agent's behavior on expected returns of other agents. By optimizing EITI or EDTI objective as a regularizer, agents are encouraged to coordinate their exploration and learn policies to optimize team performance. We show how to optimize these regularizers so that they can be easily integrated with policy gradient reinforcement learning. The resulting update rule draws a connection between coordinated exploration and intrinsic reward distribution. Finally, we empirically demonstrate the significant strength of our method in a variety of multi-agent scenarios.

为了解决探索性任务中的探索难题，本文针对具有转换依赖性的多智能体环境提出了两种探索方法：基于信息论影响的探索（EITI）和基于决策论影响的探索（EDTI），通过利用智能体协作行为中交互作用的作用加以利用。我们通过优化这两种方法来鼓励智能体协调他们的探索和学习策略，最终通过在多智能体环境中的实验演示了我们方法的高效性。

基于影响力的多智能体探索