BriefGPT.xyz
Jul, 2022
更深入的融合! 一种基于层次化潜变量推理的变分Transformer文本生成模型
Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation
HTML
PDF
Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
TL;DR
DELLA是一种新型的变分Transformer框架,通过层内和层间激活的耦合,使得变分自编码器的后验概率深度回传到整个计算路径,减少信息损失并实现更好的生成效果
Abstract
The past several years have witnessed
variational auto-encoder
's superiority in various
text generation
tasks. However, due to the sequential nature of the text, auto-regressive decoders tend to ignore
→