BriefGPT.xyz
May, 2024
视觉增强零样本图像分类的多模态大语言模型
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
HTML
PDF
Abdelrahman Abdelhamed, Mahmoud Afifi, Alec Go
TL;DR
使用多模态大语言模型(Multimodal LLMs)的简单且有效方法实现了零样本图像分类,通过生成全面的文本表示从而在交叉模态嵌入空间中生成固定维度特征,在线性分类器上融合这些特征以进行分类,取得了令人瞩目的效果。
Abstract
large language models
(LLMs) has been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for
zero-shot image classification
→