Jun, 2024
mOSCAR:一个大规模的多语言和多模态的文档级语料库
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral, Armel Zebaze, Pedro Ortiz Suarez, Julien Abadji, Rémi Lacroix...
TL;DRMultimodal Large Language Models (mLLMs) that are trained on caption-like and interleaved text-image data, such as mOSCAR, show improved in-context learning capabilities, boost in few-shot learning performance across various multilingual image-text tasks and benchmarks, and address the limitation of current multilingual and multimodal datasets.