We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .

我们提出了一种新颖的3D生成方法DMV3D，该方法使用基于变压器的3D大型重建模型对多视差扩散进行去噪。我们的重建模型采用了三平面NeRF表示，并且可以通过NeRF重建和渲染对嘈杂的多视差图像进行去噪，能在单个A100 GPU上实现约30秒的单阶段3D生成。我们使用大规模多视差图像数据集训练DMV3D，仅使用图像重建损失，而不访问3D资产。我们展示了在需要对未见过的物体部分进行概率建模以生成具有清晰纹理的多样重建的单图像重建问题上的最新成果，以及优于以往的3D扩散模型的高质量文本到3D生成结果。我们的项目网站位于此https URL。

DMV3D：使用3D大型重建模型的多视图扩散去噪