A critical challenge in multi-agent reinforcement learning(MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.

通过使用并行程序指导多个智能体高效完成需要规划 10 个以上阶段的任务，本文提出了一种名为 E-MAPP 的增强型多智能体强化学习框架，该框架整合了程序的结构信息，促进了基于程序语义的协作行为，并通过任务分配器提高了时间效率，在 Overcooked 环境中完成的一系列复杂的长期合作任务的实验结果表明， E-MAPP 在完成率、时间效率和零-shot 泛化能力方面优于强基线方法。

E-MAPP: 高效的并行程序指导多智能体强化学习