When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus. A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks. We hope that our first pre-trained big VAE language model itself and results can help the NLP community renew the interests of deep generative models in the era of large-scale pre-training, and make these principled methods more practical.

本研究提出Optimus模型，通过大规模预训练和微调，实现了面向各种自然语言生成和理解任务的通用潜在嵌入空间。与GPT-2相比，Optimus可使用潜在向量在抽象层次上进行引导语言生成。与BERT相比，由于平滑的潜在空间结构，Optimus在低资源语言理解任务上具有更好的泛化能力。实验结果表明Optimus的有效性，并在VAE语言建模基准上实现了新的最佳性能。

Optimus:通过预训练的潜在空间建模来组织句子