We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation. Recent advances in diffusion models have shown impressive results in 3D object generation, but are limited in spatial extent and quality when extended to 3D scenes. To generate complex and diverse 3D scene structures, we introduce a latent tree representation to effectively encode both lower-frequency geometry and higher-frequency detail in a coarse-to-fine hierarchy. We can then learn a generative diffusion process in this latent 3D scene space, modeling the latent components of a scene at each resolution level. To synthesize large-scale scenes with varying sizes, we train our diffusion model on scene patches and synthesize arbitrary-sized output 3D scenes through shared diffusion generation across multiple scene patches. Through extensive experiments, we demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation and for probabilistic completion for partial scene observations.

本研究提出了一种新颖的潜在扩散模型LT3SD，旨在解决现有三维场景生成方法在空间范围和质量上的局限。我们引入潜在树表示法，有效编码不同频率的几何和细节，从而提高生成复杂多样的三维场景的能力。实验表明，LT3SD在大规模、高质量的无条件三维场景生成及部分场景观察的概率补全方面具有明显优势。

LT3SD：用于三维场景扩散的潜在树模型