BriefGPT.xyz
May, 2023
精细视觉语言理解进展的衡量
Measuring Progress in Fine-grained Vision-and-Language Understanding
HTML
PDF
Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks, Aida Nematzadeh
TL;DR
本文通过对四个具有挑战性的细粒度基准进行实验研究,发现X-VLM是最好的模型,同时强调新的损失函数和丰富的数据源对于学习细粒度技能非常重要。
Abstract
While
pretraining
on large-scale image-text data from the Web has facilitated rapid progress on many
vision-and-language
(V&L) tasks, recent work has demonstrated that pretrained models lack "
→