BriefGPT.xyz
Jun, 2023
视觉语言预训练的全局和局部语义补全学习
Global and Local Semantic Completion Learning for Vision-Language Pre-training
HTML
PDF
Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai...
TL;DR
本文提出了一种GLSCL任务,旨在促进全局-局部对齐和局部-局部对齐,该任务包括MGSC和MLTC,可通过跨模式交互补充掩码数据的缺失语义并恢复全局和局部特征,实验结果显示,该方法在多种视觉语言基准测试中获得了最先进的性能。
Abstract
cross-modal alignment
plays a crucial role in
vision-language pre-training
(VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, inspired by the success of
→