BriefGPT.xyz
Oct, 2023
Videoprompter: 零-shot 视频理解的一组基础模型
Videoprompter: an ensemble of foundational models for zero-shot video understanding
HTML
PDF
Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
TL;DR
本文提出了一种将预训练的判别性视觉-语言模型与预训练的生成性视频-文本和文本-文本模型相结合的框架,在零样本设置中引入了两个关键改进,提高了视觉-语言模型的性能,并在视频理解方面展示了一致的改进。
Abstract
vision-language models
(
vlms
) classify the query video by calculating a
similarity score
between the visual features and text-based class
→