BriefGPT.xyz
Sep, 2023
高效图像到视频迁移学习的空间和时间解耦
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
HTML
PDF
Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao...
TL;DR
DiST是一种双编码器结构,其中预训练的基础模型充当空间编码器,引入了轻量级网络作为时间编码器,通过插入一个集成分支来融合时空信息,从而实现了视频的空间和时间解耦学习,提高了性能表现。
Abstract
Recently, large-scale
pre-trained language-image models
like CLIP have shown extraordinary capabilities for understanding spatial contents, but naively transferring such models to
video recognition
still suffers
→