BriefGPT.xyz
May, 2020
幕后揭秘:揭示预训练视觉语言模型的秘密
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
HTML
PDF
Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen...
TL;DR
该论文研究了使用Transformer模型的大规模预训练模型在图像与语言(V+L)方面的应用,通过评估和探索内部机制,提供了关于多模式预训练及其注意力头的启示和指导。
Abstract
Recent
transformer-based
large-scale
pre-trained models
have revolutionized
vision-and-language
(V+L) research. Models such as ViLBERT, LX
→