BriefGPT.xyz
May, 2023
视觉语言预训练数据压缩
Too Large; Data Reduction for Vision-Language Pre-Training
HTML
PDF
Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou
TL;DR
该论文提出一种名为TL;DR的视觉语言学习算法,它利用基于编码器-解码器的编码器来选择代表性样本,并生成新的标题,旨在将现有的大规模VLP数据压缩为小高质量数据集。实验证明,使用TL;DR压缩后的数据集能够在许多下游任务中提供与完整数据集相似或甚至更好的结果。
Abstract
This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale
vision-language
pre-training
(VLP) datasets. To address these issues, we propose an efficient
→