BriefGPT.xyz
Jul, 2023
SAS视频QA:自适应采样优化视频问答
SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering
HTML
PDF
Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria
TL;DR
提出了两种帧采样策略,即最主导帧(MDF)和最隐含帧(MIF),用于最大限度地保留对给定问题最重要的帧,验证实验结果表明这些策略能够提高图像-文本预训练模型的性能。
Abstract
Video question--answering is a fundamental task in the field of
video understanding
. Although current vision--language models (VLMs) equipped with
video transformers
have enabled temporal modeling and yielded sup
→