end-to-end autonomous driving has garnered widespread attention. Current
end-to-end approaches largely rely on the supervision from perception tasks
such as detection, tracking, and map segmentation to aid in learning scene
representations. However, these methods require extensive anno