Apr, 2024

LaDiC:扩散模型在图像生成的文本方面真的不如自回归模型吗?

TL;DRDiffusion models have the potential for enhancing image-to-text generation and surpass Auto-Regressive models by introducing LaDiC, which incorporates context modeling, a dedicated latent space for captions, a regularization module, a diffuser for semantic conversion, and a Back&Refine technique, achieving state-of-the-art performance on the MS COCO dataset without pre-training or ancillary modules.