BriefGPT.xyz
Feb, 2024
零样本视频问答的问题引导视觉描述
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
HTML
PDF
David Romero, Thamar Solorio
TL;DR
Q-ViD 是一种简单的视频问答方法,通过使用一个单一的指令感知开放式视觉语言模型(InstructBLIP)来处理视频问答问题,生成视频帧描述,并结合一个大型语言模型(LLM)进行多项选择问答,取得了与当前最先进模型相媲美甚至更高的性能。
Abstract
We present
q-vid
, a simple approach for
video question answering
(video QA), that unlike prior methods, which are based on complex architectures, computationally expensive pipelines or use closed models like GPTs
→