Multi-agent reinforcement learning (MARL) has shown significant potential in traffic signal control (TSC). However, current MARL-based methods often suffer from insufficient generalization due to the fixed traffic patterns and road network conditions used during training. This limitation results in poor adaptability to new traffic scenarios, leading to high retraining costs and complex deployment. To address this challenge, we propose two algorithms: PLight and PRLight. PLight employs a model-based reinforcement learning approach, pretraining control policies and environment models using predefined source-domain traffic scenarios. The environment model predicts the state transitions, which facilitates the comparison of environmental features. PRLight further enhances adaptability by adaptively selecting pre-trained PLight agents based on the similarity between the source and target domains to accelerate the learning process in the target domain. We evaluated the algorithms through two transfer settings: (1) adaptability to different traffic scenarios within the same road network, and (2) generalization across different road networks. The results show that PRLight significantly reduces the adaptation time compared to learning from scratch in new TSC scenarios, achieving optimal performance using similarities between available and target scenarios.

本研究解决了现有多智能体强化学习方法在交通信号控制中因固定的交通模式和路网条件导致的适应性差的问题。提出的两种算法PLight和PRLight，通过预训练的控制策略和环境模型提升了系统对新交通场景的适应能力，并显著缩短了在新场景中学习的时间。实验结果表明，PRLight在不同交通场景中可实现最佳性能，并有效降低重训练成本。

通过基于模型的强化学习和策略重用增强交通信号控制