BriefGPT.xyz
Sep, 2023
来自冻结视觉-语言模型的零样本视频时刻检索
Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models
HTML
PDF
Dezhao Luo, Jiabo Huang, Shaogang Gong, Hailin Jin, Yang Liu
TL;DR
我们提出了一种零样本方法,可以从任意的视觉语言模型中获得可泛化的视觉文字先验,并利用条件特征细化模块和自下而上的提案生成策略来改善视频片段与文本的对齐,从而在视频片段检索中实现显著的性能优势。
Abstract
Accurate
video moment retrieval
(VMR) requires universal
visual-textual correlations
that can handle unknown vocabulary and unseen scenes. However, the learned correlations are likely either biased when derived f
→