Value decomposition methods have gradually become popular in the cooperative multi-agent reinforcement learning field. However, almost all value decomposition methods follow the Individual Global Max (IGM) principle or its variants, which restricts the range of issues that value decomposition methods can resolve. Inspired by the notion of dual self-awareness in psychology, we propose a dual self-awareness value decomposition framework that entirely rejects the IGM premise. Each agent consists of an ego policy that carries out actions and an alter ego value function that takes part in credit assignment. The value function factorization can ignore the IGM assumption by using an explicit search procedure. We also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.

在合作多智能体强化学习领域，我们提出了一种基于双重自我意识概念的价值分解框架，它完全拒绝了个体全局最大原则。通过使用明确的搜索过程，价值函数分解可以忽略IGM假设。我们还提出了一种新颖的抗自我探索机制，以避免算法陷入局部最优解。作为第一个完全不遵循IGM规则的价值分解方法，我们提出的框架在各种协作任务中实现了理想的性能。

面向合作多智能体强化学习的双自我感知价值分解框架（无个体全局最大值）