CLIP-Hitchhiker的长视频检索指南

May, 2022

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

TL;DR本文旨在将图像-文本模型应用于长视频检索，并通过查询打分的帧嵌入的加权平均作为时间建模有效基线，提出一种在长视频检索基准测试中表现卓越的改善方法。

Abstract

Our goal in this paper is the adaptation of image-text models for long video retrieval. Recent works have demonstrated state-of-the-art performance in →