BriefGPT.xyz
May, 2021
VLM: 任务无关的视频语言模型预训练,用于视频理解
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
HTML
PDF
Hu Xu, Gargi Ghosh, Po-Yao Huang, Prahal Arora, Masoumeh Aminzadeh...
TL;DR
提供了一种简化、任务无关的多模态预训练方法,可以接受视频或文本输入,或两者皆可用于各种端任务。实验结果表明,在多种任务中表现出比以前的方法更强的性能,通常优于任务特定的预训练。
Abstract
We present a simplified, task-agnostic
multi-modal pre-training
approach that can accept either video or text input, or both for a variety of
end tasks
. Existing pre-training are task-specific by adopting either
→