BriefGPT.xyz
Oct, 2022
视觉和语言变换器是否学习了基于谓词和名词的依赖关系?
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
HTML
PDF
Mitja Nikolaus, Emmanuelle Salin, Stephane Ayache, Abdellah Fourtassi, Benoit Favre
TL;DR
本文研究视觉-语言建模,通过创建新的多模态任务和分析预训练数据的质量,发现预训练数据的质量和多模态预训练目标对模型的性能影响重要。
Abstract
Recent advances in
vision-and-language modeling
have seen the development of
transformer architectures
that achieve remarkable performance on
mul
→