BriefGPT.xyz
Sep, 2021
跨模态对比学习:多模态视频表征
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
HTML
PDF
Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox
TL;DR
提出了一种新的对比损失方法 CrossCLR,能够实现跨模态嵌入学习中考虑嵌入空间中类内相似性,避免了同一内容被映射到多个点的问题,从而显著提高了视频与文本的检索和视频字幕生成的性能。该方法具有很好的普适性,可用于其他模态之间的联合嵌入学习。
Abstract
contrastive learning
allows us to flexibly define powerful losses by contrasting positive pairs from sets of negative samples. Recently, the principle has also been used to learn
cross-modal embeddings
for video
→