BriefGPT.xyz
Dec, 2020
探究视觉与语言预训练模型的鲁棒性
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
HTML
PDF
Linjie Li, Zhe Gan, Jingjing Liu
TL;DR
通过对现有的预训练模型进行全面评估和改进,本研究提出了一种名为 Mango 的方法,在嵌入空间中学习多模态对抗性噪声生成器,使得预训练的视觉-语言模型的鲁棒性得到了大幅度提升,并在七项鲁棒性测试中创造了新的最高水平。
Abstract
Large-scale
pre-trained multimodal transformers
, such as ViLBERT and UNITER, have propelled the
state of the art
in vision-and-language (V+L) research to a new level. Although achieving impressive performance on
→