Learning-based decision-making has the potential to enable generalizable Autonomous Driving (AD) policies, reducing the engineering overhead of rule-based approaches. Imitation Learning (IL) remains the dominant paradigm, benefiting from large-scale human demonstration datasets, but it suffers from inherent limitations such as distribution shift and imitation gaps. Reinforcement Learning (RL) presents a promising alternative, yet its adoption in AD remains limited due to the lack of standardized and efficient research frameworks. To this end, we introduce V-Max, an open research framework providing all the necessary tools to make RL practical for AD. V-Max is built on Waymax, a hardware-accelerated AD simulator designed for large-scale experimentation. We extend it using ScenarioNet's approach, enabling the fast simulation of diverse AD datasets. V-Max integrates a set of observation and reward functions, transformer-based encoders, and training pipelines. Additionally, it includes adversarial evaluation settings and an extensive set of evaluation metrics. Through a large-scale benchmark, we analyze how network architectures, observation functions, training data, and reward shaping impact RL performance.

本研究针对现有模仿学习在自动驾驶中面临的分布偏移和模仿差距问题，提出了一种新的解决方案V-Max，一个开放的研究框架，旨在提高强化学习在自动驾驶中的应用可行性。通过构建在硬件加速的自动驾驶模拟器Waymax之上，V-Max集成了多种观察和奖励函数，提供了高效的训练流程，从而实现了对网络架构、观察函数和奖励设计对强化学习性能影响的深入分析。

V-Max：使强化学习在自动驾驶中更具实用性