Nowadays, transportation networks face the challenge of sub-optimal control policies that can have adverse effects on human health, the environment, and contribute to traffic congestion. Increased levels of air pollution and extended commute times caused by traffic bottlenecks make intersection traffic signal controllers a crucial component of modern transportation infrastructure. Despite several adaptive traffic signal controllers in literature, limited research has been conducted on their comparative performance. Furthermore, despite carbon dioxide (CO2) emissions' significance as a global issue, the literature has paid limited attention to this area. In this report, we propose EcoLight, a reward shaping scheme for reinforcement learning algorithms that not only reduces CO2 emissions but also achieves competitive results in metrics such as travel time. We compare the performance of tabular Q-Learning, DQN, SARSA, and A2C algorithms using metrics such as travel time, CO2 emissions, waiting time, and stopped time. Our evaluation considers multiple scenarios that encompass a range of road users (trucks, buses, cars) with varying pollution levels.

该研究报告介绍了一种名为EcoLight的奖励塑造方案，用于强化学习算法中，既可以减少二氧化碳排放，又可以在诸如旅行时间之类的指标上获得具有竞争力的结果。该研究比较了采用表格型Q学习、DQN、SARSA和A2C算法的性能，使用的指标包括旅行时间、二氧化碳排放、等待时间和停车时间，考虑了多种道路使用者（卡车、公交车、汽车）和不同污染水平的多个场景。

基于深度强化学习的智能交通信号控制与CO2排放优化