This paper presents a novel approach to Multi-Agent Reinforcement Learning
(MARL) that combines cooperative task decomposition with the learning of reward
machines (RMs) encoding the structure of the sub-tasks. The proposed method
helps deal with the non-Markovian nature of the rewards in partially observable
environments and improves the interpretability of the learnt policies required
to complete the cooperative task. The RMs associated with each sub-task are
learnt in a decentralised manner and then used to guide the behaviour of each
agent. By doing so, the complexity of a cooperative multi-agent problem is
reduced, allowing for more effective learning. The results suggest that our
approach is a promising direction for future research in MARL, especially in
complex environments with large state spaces and multiple agents.

本文提出了一种新的多智能体强化学习方法，该方法将合作任务分解与学习奖励机器相结合，以编码子任务的结构。该方法有助于处理部分可观察环境中奖励的非马尔可夫性质，并提高了完成合作任务所需的学习策略的可解释性。每个子任务关联的奖励机器以分散的方式学习，然后用于指导每个智能体的行为，从而减少了合作多智能体问题的复杂性，更有效的学习。结果表明，我们的方法是未来 MARL 研究的一个有前景的方向，特别是在具有大状态空间和多个智能体的复杂环境中。