In this paper, we study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) to specify the reward functions such that the prior knowledge of high-level events in a task can be leveraged to facilitate the learning efficiency. Unlike the existing work that RMs have been incorporated into MARL for task decomposition and policy learning in relatively simple domains or with an assumption of independencies among the agents, we present Multi-Agent Reinforcement Learning with a Hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios when the events among agents can occur concurrently and the agents are highly interdependent. MAHRM exploits the relationship of high-level events to decompose a task into a hierarchy of simpler subtasks that are assigned to a small group of agents, so as to reduce the overall computational complexity. Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.

本文研究利用奖励机器（RMs）来指定奖励函数，从而利用任务中高级事件的先前知识来促进学习效率的合作多智能体强化学习（MARL）问题。我们提出了具有层次结构的高级事件的多智能体强化学习（MAHRM），能够应对多智能体之间事件可以并发发生且代理具有高度相互依赖的复杂情况，通过分解任务为一系列更简单的子任务，并分配给少量智能体来减少整体计算复杂性。在三个合作MARL领域的实验结果表明，MAHRM在使用相同的高级事件先前知识时优于其他MARL方法。

多智能体强化学习与奖励机器的层次