In this paper, we study the shortest path problem (SPP) with multiple source-destination pairs (MSD), namely MSD-SPP, to minimize average travel time of all shortest paths. The inherent traffic capacity limits within a road network contributes to the competition among vehicles. Multi-agent reinforcement learning (MARL) model cannot offer effective and efficient path planning cooperation due to the asynchronous decision making setting in MSD-SPP, where vehicles (a.k.a agents) cannot simultaneously complete routing actions in the previous time step. To tackle the efficiency issue, we propose to divide an entire road network into multiple sub-graphs and subsequently execute a two-stage process of inter-region and intra-region route planning. To address the asynchronous issue, in the proposed asyn-MARL framework, we first design a global state, which exploits a low-dimensional vector to implicitly represent the joint observations and actions of multi-agents. Then we develop a novel trajectory collection mechanism to decrease the redundancy in training trajectories. Additionally, we design a novel actor network to facilitate the cooperation among vehicles towards the same or close destinations and a reachability graph aimed at preventing infinite loops in routing paths. On both synthetic and real road networks, our evaluation result demonstrates that our approach outperforms state-of-the-art planning approaches.

本文针对多源-目的地最短路径问题（MSD-SPP）进行研究，旨在最小化所有最短路径的平均旅行时间。提出的异步MARL框架通过对道路网络进行分区和引入新的轨迹收集机制，有效解决了路径规划的效率和异步决策问题，实验结果表明该方法在合成和真实道路网络上均优于现有的规划方法。

异步多智能体强化学习的协同路径规划