We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.

DiffRF是一种针对三维辐射场合成的新方法，利用去噪扩散概率模型生成体积辐射场。通过将去噪公式与渲染损失配对，我们的模型可以学习偏态先验，使其倾向于良好的图像质量而不是试图复制浮动伪像等拟合误差。与3D GAN相比，我们的扩散方法自然地实现了条件生成，例如掩蔽完成或推理时的单视图3D合成。

DiffRF: 基于渲染引导的 3D 辐射场扩散