BriefGPT.xyz
Aug, 2024
文本引导的视频掩码自编码器
Text-Guided Video Masked Autoencoder
HTML
PDF
David Fan, Jue Wang, Shuai Liao, Zhikang Zhang, Vimal Bhat...
TL;DR
本研究解决了现有视频掩码自编码器(MAE)在处理视觉信息时的局限性,提出了一种新颖的文本引导掩码算法(TGM),该算法基于与配对字幕的高度对应性来遮掩视频区域,而不依赖显式视觉线索。研究表明,TGM在视频识别任务中优于传统的掩码算法,展示了自然语言在视频建模中的互补价值。
Abstract
Recent
Video Masked Autoencoder
(MAE) works have designed improved masking algorithms focused on
Saliency
. These works leverage visual cues such as motion to mask the most salient regions. However, the robustness
→