BriefGPT.xyz
Feb, 2022
三重对比学习视觉语言预训练
Vision-Language Pre-Training with Triple Contrastive Learning
HTML
PDF
Jinyu Yang, Jiali Duan, Son Tran, Yi Xu, Sampath Chanda...
TL;DR
本研究提出了一种名为TCL的视觉-语言预训练三重对比学习框架,通过交叉模式对齐和内部模态自我监督来提高学习的代表性,并通过最大化图像/文本局部区域与全局摘要之间的平均互信息, 取得了在图像-文本检索和视觉问答等任务中的优异表现。
Abstract
vision-language representation learning
largely benefits from image-text alignment through
contrastive losses
(e.g., InfoNCE loss). The success of this alignment strategy is attributed to its capability in maximi
→