3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed the rise of several generative models, among which diffusion models show great potential for learning the transformation of two distributions. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. During training, the object boxes diffuse from the ground truth boxes to the Gaussian distribution, and the decoder learns to reverse this noise process. In the inference stage, the model progressively refines a set of random boxes to the prediction results. We provide detailed experiments on the KITTI benchmark and achieve promising performance compared to classical anchor-based 3D detection methods.

Diff3Det使用扩散模型进行3D物体检测的提案生成，通过将检测框视为生成目标，在训练过程中将物体框从真实框扩散到高斯分布，并学习解码器逆转这个噪声过程。推理阶段，模型逐渐将一系列随机框细化为预测结果，在KITTI基准测试上表现出有希望的性能，相较于经典的基于锚点的3D检测方法。

基于扩散的随机盒子三维物体检测