BriefGPT.xyz
May, 2023
TVTSv2:学习开箱即用的大规模时空视觉表示
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
HTML
PDF
Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia...
TL;DR
本论文分析了导致视频模型性能下降的因素——语言监督失真,提出了一种去除降级的预训练策略,并采用排序任务同时使用掩蔽技术进行可扩展的训练,得到了一系列新的模型。
Abstract
The ultimate goal for
foundation models
is realizing
task-agnostic
, i.e., supporting out-of-the-box usage without task-specific fine-tuning. Although breakthroughs have been made in natural language processing an
→