Apr, 2023
Multimodal C4: 亿级图文混合语料库
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge...
TL;DRMultimodal C4 is a publicly available dataset that supports in-context vision and language models, including linear assignment algorithm, for complex learning between images and texts.