Apr, 2024
LaDiC:扩散模型在图像生成的文本方面真的不如自回归模型吗?
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo...
TL;DRDiffusion models have the potential for enhancing image-to-text generation and surpass Auto-Regressive models by introducing LaDiC, which incorporates context modeling, a dedicated latent space for captions, a regularization module, a diffuser for semantic conversion, and a Back&Refine technique, achieving state-of-the-art performance on the MS COCO dataset without pre-training or ancillary modules.