Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing the system's limits, but all use cases share a key requirement: reliability. A simulation agent should behave as intended by the designer, minimizing unintended actions like collisions that can compromise the signal-to-noise ratio of analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents nearly solve the full training set within a day. They generalize effectively to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in those cases. Demonstrations of agent behaviors can be found at this link. We open-source both the pre-trained agents and the complete code base. Demonstrations of agent behaviors can be found at \url{https://sites.google.com/view/reliable-sim-agents}.

本研究解决了模拟代理在与人类交互系统中可靠性的挑战，尤其是自动驾驶车辆。通过在Waymo开放运动数据集上大规模自我对弈训练，研究者使代理在避免碰撞和偏离道路的情况下，完成99.8%的目标，展示了高效的泛化能力和在不同场景中的鲁棒性。这一方法显著提升了模拟驾驶代理的可靠性，并为实际应用提供了潜在影响。

通过自我对弈扩展构建可靠的模拟驾驶代理