BriefGPT.xyz
Apr, 2023
视觉任务的视觉语言模型综述
Vision-Language Models for Vision Tasks: A Survey
HTML
PDF
Jingyi Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu
TL;DR
本文系统回顾了基于语言的视觉模型在各种视觉识别任务中的应用,并总结了广泛采用的网络结构、预训练目标和下游任务,以及预训练和评估中广泛采用的数据集,并回顾和分类现有的预训练方法、传输学习方法和知识蒸馏方法。
Abstract
Most
visual recognition
studies rely heavily on crowd-labelled data in
deep neural networks
(DNNs) training, and they usually train a DNN for each single
→