BriefGPT.xyz
Nov, 2021
放松交叉模态同步的自监督音视频表示学习
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity
HTML
PDF
Pritam Sarkar, Ali Etemad
TL;DR
CrissCross 是一种自监督学习框架,用于学习音频和视觉之间的表示,它还可以学习异步交叉模态关系,通过在多项下游任务方面的表现显示其有效性,并在 Kinetics-Sound 数据集上实现了优于或不逊于当前自监督方法的表现,同时也提供了经过预训练的模型。
Abstract
We present CrissCross, a
self-supervised framework
for learning
audio-visual representations
. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard 'synchronou
→