BriefGPT.xyz
Oct, 2022
视觉-语言预训练:基础、最新进展和未来趋势
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
HTML
PDF
Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu...
TL;DR
本文研究多模态智能领域的视觉-语言预训练方法,分为三类分类,包含图像-文本、核心计算机视觉和视频-文本任务,针对每类任务,提出了针对性的方法,分别探究了研究进展和存在的挑战并讨论了更先进的主题。
Abstract
This paper surveys
vision-language pre-training
(VLP) methods for
multimodal intelligence
that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for
→