VideoCLIP：用于零样本视频文本理解的对比预训练

Sep, 2021

VideoCLIP：用于零样本视频文本理解的对比预训练

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan...

TL;DR本文提出了VideoCLIP，这是一种对比学习方法，用于在没有下游任务的标签情况下，预训练用于零样本视频和文本理解的统一模型。我们的实验表明，这种方法在一系列下游任务中的表现最优，超越了之前的工作，并在某些情况下甚至优于有监督方法。

Abstract

We present videoclip, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks. videoclip trains a →