Traffic Signal Control (TSC) aims to reduce the average travel time of
vehicles in a road network, which in turn enhances fuel utilization efficiency,
air quality, and road safety, benefiting society as a whole. Due to the
complexity of long-horizon control and coordination, most prior TSC methods
leverage deep reinforcement learning (RL) to search for a control policy and
have witnessed great success. However, TSC still faces two significant
challenges. 1) The travel time of a vehicle is delayed feedback on the
effectiveness of TSC policy at each traffic intersection since it is obtained
after the vehicle has left the road network. Although several heuristic reward
functions have been proposed as substitutes for travel time, they are usually
biased and not leading the policy to improve in the correct direction. 2) The
traffic condition of each intersection is influenced by the non-local
intersections since vehicles traverse multiple intersections over time.
Therefore, the TSC agent is required to leverage both the local observation and
the non-local traffic conditions to predict the long-horizontal traffic
conditions of each intersection comprehensively. To address these challenges,
we propose DenseLight, a novel RL-based TSC method that employs an unbiased
reward function to provide dense feedback on policy effectiveness and a
non-local enhanced TSC agent to better predict future traffic conditions for
more precise traffic control. Extensive experiments and ablation studies
demonstrate that DenseLight can consistently outperform advanced baselines on
various road networks with diverse traffic flows. The code is available at
this https URL

本研究提出利用无偏奖励函数提供密集反馈信息和非本地增强交通信号控制智能体更好地预测交通状况以实现更精确的交通控制的新型增强学习交通信号控制（TSC）方法，并经过了大量实验和消融研究验证其性能优于先进的基线方法。