BriefGPT.xyz
Mar, 2024
构建多语言视觉文本数据集揭示视觉语言模型的多语言能力
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
HTML
PDF
Jesse Atuhurra, Iqra Ali, Tatsuya Hiraoka, Hidetaka Kamigaito, Tomoya Iwakura...
TL;DR
我们通过模板构建了四种语言的多语言视觉文本数据集,介绍了九项视觉语言任务,并引入了解释机制以评估大型语言模型在视觉语言任务上的表现。
Abstract
large language models
(LLMs) have increased interest in
vision language models
(VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, b
→