潜在扩散模型中的场景表示：超越表面统计

Jun, 2023

潜在扩散模型中的场景表示：超越表面统计

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Yida Chen, Fernanda Viégas, Martin Wattenberg

TL;DR本文研究了潜在扩散模型在产生逼真图像时的内在机制，通过使用线性探针发现，LDM的内部激活编码了简单场景的几何和显著对象/背景区别的线性表示，并且这些表示出现在去噪处理的早期阶段，对LDM图像合成具有因果作用，并可用于简单的高级编辑。

Abstract

latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work,