CVPRFeb, 2021
概念 12M:推动网页规模的图像文本预训练,以识别长尾视觉概念
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut
TL;DR通过松弛 Conceptual Captions 3M (CC3M) [Sharma et al. 2018] 数据收集流程,我们引入了 Conceptual 12M(CC12M)数据集,并通过针对长尾视觉识别的多个下游任务基准测试其有效性,结果表明增加预训练数据规模会使视觉和语言任务更加有效。