BriefGPT.xyz
Jan, 2024
连接,塌陷,腐败:利用单模态数据学习跨模态任务
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
HTML
PDF
Yuhui Zhang, Elaine Sui, Serena Yeung-Levy
TL;DR
利用预训练的多模态对比表示空间可以从单模态数据中学习跨模态任务,我们提供了这个空间几何的理论解释,并引入了一个三步方法(连接、降维、破坏)来缩小模态差距,增强嵌入的互换性,实现了从单模态数据中有效地进行跨模态学习,取得了零样本图像/音频/视频字幕和文本到图像生成的最新成果。
Abstract
Building
cross-modal applications
is challenging due to limited paired multi-modal data. Recent works have shown that leveraging a pre-trained
multi-modal contrastive representation space
enables cross-modal task
→