BriefGPT.xyz
Dec, 2022
从图像到文本提示:使用Frozen大语言模型进行零样本VQA
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
HTML
PDF
Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li...
TL;DR
提出了Img2Prompt模块,它可以提供可以描述图像内容和自构建问题答案对的提示,并且能够帮助LLMs执行无需端到端训练的零射击VQA任务。
Abstract
large language models
(LLMs) have demonstrated excellent zero-shot generalization to new language tasks. However, effective utilization of LLMs for
zero-shot visual question-answering
(VQA) remains challenging, p
→