Modern generative models are usually designed to match target distributions directly in the data space, where the intrinsic dimensionality of data can be much lower than the ambient dimensionality. We argue that this discrepancy may contribute to the difficulties in training generative models. We therefore propose to map both the generated and target distributions to the latent space using the encoder of a standard autoencoder, and train the generator (or decoder) to match the target distribution in the latent space. The resulting method, perceptual generative autoencoder (PGA), is then incorporated with a maximum likelihood or variational autoencoder (VAE) objective to train the generative model. With maximum likelihood, PGAs generalize the idea of reversible generative models to unrestricted neural network architectures and arbitrary latent dimensionalities. When combined with VAEs, PGAs can generate sharper samples than vanilla VAEs. Compared to other autoencoder-based generative models using simple priors, PGAs achieve state-of-the-art FID scores on CIFAR-10 and CelebA.

本文介绍了一种名为感知生成自编码器的新型生成模型。该模型通过将生成的和目标分布映射到一个潜空间中，并用具有理论依据的数据和潜空间重构损失同时在数据空间和隐空间中强制同步，从而能够在无限制的神经网络体系结构和任意数量的潜在维度上推广可逆生成模型的思想，并且在样本质量方面显著优于传统自编码器和其他基于自编码器的生成模型。