Large Vision-Language Models (LVLMs) have recently garnered significant attention, with many efforts aimed at harnessing their general knowledge to enhance the interpretability and robustness of autonomous driving models. However, LVLMs typically rely on large, general-purpose datasets and lack the specialized expertise required for professional and safe driving. Existing vision-language driving datasets focus primarily on scene understanding and decision-making, without providing explicit guidance on traffic rules and driving skills, which are critical aspects directly related to driving safety. To bridge this gap, we propose IDKB, a large-scale dataset containing over one million data items collected from various countries, including driving handbooks, theory test data, and simulated road test data. Much like the process of obtaining a driver's license, IDKB encompasses nearly all the explicit knowledge needed for driving from theory to practice. In particular, we conducted comprehensive tests on 15 LVLMs using IDKB to assess their reliability in the context of autonomous driving and provided extensive analysis. We also fine-tuned popular models, achieving notable performance improvements, which further validate the significance of our dataset. The project page can be found at: \url{https://4dvlab.github.io/project_page/idkb.html}

本研究关注当前大型视觉-语言模型在自动驾驶中的专业性缺失，提出IDKB数据集以填补这一空白。IDKB包含来自多个国家的驾驶手册、理论测试数据和模拟路考数据，为自动驾驶模型提供全面的驾驶知识。在对15个LVLM进行测试后，发现经过微调的模型显著提升了性能，验证了该数据集的重要性。

大型视觉-语言模型能否获得驾驶执照？面向可靠通用人工智能的基准研究