交叉编织多模态编码器

Apr, 2022

Cross-stitched Multi-modal Encoders

Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim...

TL;DR本文提出了一种新的多模态语音和文本输入结构，使用多头交叉注意力结合预训练语音和文本编码器，并在目标问题上联合微调。所得的编码器可用于连续的令牌级别分类或对同时文本和语音进行话语级别的预测，并高效地捕获声学-韵律和词汇信息。

Abstract

In this paper, we propose a novel architecture for multi-modal speech and text input. We combine pretrained →