Offline reinforcement learning (RL) that learns policies from offline
datasets without environment interaction has received considerable attention in
recent years. Compared with the rich literature in the single-agent case,
offline multi-agent RL is still a relatively underexplored area. Most existing
methods directly apply offline RL ingredients in the multi-agent setting
without fully leveraging the decomposable problem structure, leading to less
satisfactory performance in complex tasks. We present OMAC, a new offline
multi-agent RL algorithm with coupled value factorization. OMAC adopts a
coupled value factorization scheme that decomposes the global value function
into local and shared components, and also maintains the credit assignment
consistency between the state-value and Q-value functions. Moreover, OMAC
performs in-sample learning on the decomposed local state-value functions,
which implicitly conducts max-Q operation at the local level while avoiding
distributional shift caused by evaluating out-of-distribution actions. Based on
the comprehensive evaluations of the offline multi-agent StarCraft II
micro-management tasks, we demonstrate the superior performance of OMAC over
the state-of-the-art offline multi-agent RL methods.

OMAC 是一种新的离线多智能体强化学习算法，采用耦合值分解方案将全局价值函数分解为本地和共享组件，并保持状态值和 Q 值函数之间的信用分配一致性，并在分解的本地状态值函数上执行样本内学习，同时避免由于评估分布外动作而引起的分布移位，基于综合评估离线多智能体 StarCraft II 微观管理任务，我们展示了 OMAC 比最先进的离线多智能体 RL 方法具有更优越的性能。