BriefGPT.xyz
Nov, 2020
从音频-视觉空间对齐中学习表示
Learning Representations from Audio-Visual Spatial Alignment
HTML
PDF
Pedro Morgado, Yi Li, Nuno Vasconcelos
TL;DR
介绍了一个针对学习自我监督前置任务的音频视频表示方法,通过引入transformer架构和空间对齐技术,提高了网络的感知和学习效率,结果表明,该方法在诸如音频视觉对应、空间对齐、动作识别、视频语义分割等多项任务中表现出良好的性能。
Abstract
We introduce a novel
self-supervised
pretext task for learning representations from
audio-visual
content. Prior work on
audio-visual
repre
→