BriefGPT.xyz
Aug, 2022
图像作为外语:BEiT预训练模型用于所有视觉和视觉语言任务
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
HTML
PDF
Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng...
TL;DR
本文介绍了通用的多模态基础模型BEiT-3,通过三个方面的改进:骨干架构、预训练任务和模型扩展,实现了在视觉和视觉语言任务上的最先进转移性能。
Abstract
A big convergence of language, vision, and
multimodal pretraining
is emerging. In this work, we introduce a general-purpose multimodal foundation model
beit-3
, which achieves state-of-the-art transfer performance
→