BriefGPT.xyz
Mar, 2021
SemVLP: 多层次语义对齐的视觉语言预训练
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
HTML
PDF
Chenliang Li, Ming Yan, Haiyang Xu, Fuli Luo, Wei Wang...
TL;DR
本文提出SemVLP预训练方法,通过单流预训练和双流预训练相结合,使用共享Transformer网络和可插入的跨模态注意模块,在不同的语义层次上对图像和文本进行联合对齐,以对齐跨模态表示,实验表明该方法可对齐不同语义粒度。
Abstract
vision-language pre-training
(VLP) on large-scale image-text pairs has recently witnessed rapid progress for learning
cross-modal representations
. Existing pre-training methods either directly concatenate image r
→