We present Viewset Diffusion: a framework for training image-conditioned 3D generative models from 2D data. Image-conditioned 3D generative models allow us to address the inherent ambiguity in single-view 3D reconstruction. Given one image of an object, there is often more than one possible 3D volume that matches the input image, because a single image never captures all sides of an object. Deterministic models are inherently limited to producing one possible reconstruction and therefore make mistakes in ambiguous settings. Modelling distributions of 3D shapes is challenging because 3D ground truth data is often not available. We propose to solve the issue of data availability by training a diffusion model which jointly denoises a multi-view image set.We constrain the output of Viewset Diffusion models to a single 3D volume per image set, guaranteeing consistent geometry. Training is done through reconstruction losses on renderings, allowing training with only three images per object. Our design of architecture and training scheme allows our model to perform 3D generation and generative, ambiguity-aware single-view reconstruction in a feed-forward manner. Project page: szymanowiczs.github.io/viewset-diffusion.

借助Viewset Diffusion框架，可以从2D数据中训练图像条件化的3D生成模型，从而解决单视图3D重建中的歧义问题，并通过对多视图图像集的去噪扩展了3D真实数据的可用性，通过仅渲染3张图片，我们的模型可以执行3D生成和单视图重建。

视图集扩散：从二维数据生成(0-)图像条件下的三维生成模型