BriefGPT.xyz
Jul, 2022
Clover: 一种统一的视频语言对齐和融合模型
Clover: Towards A Unified Video-Language Alignment and Fusion Model
HTML
PDF
Jingjia Huang, Yinan Li, Jiashi Feng, Xiaoshuai Sun, Rongrong Ji
TL;DR
本文提出了Clover方法,通过一种新颖的三模式对齐预训练任务,提高了跨模式特征对齐和融合,同时通过从语义掩蔽样本学习和新的成对排名损失增强三模式对齐。Clover在多个下游任务中取得了新的最先进水平,包括零-shot和微调设置下的三个检索任务和八个视频问答任务。
Abstract
Building a
universal video-language model
for solving various
video understanding tasks
(e.g., text-video retrieval, video question answering) is an open challenge to the machine learning field. Towards this goal
→