BriefGPT.xyz
Apr, 2022
交叉编织多模态编码器
Cross-stitched Multi-modal Encoders
HTML
PDF
Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim...
TL;DR
本文提出了一种新的多模态语音和文本输入结构,使用多头交叉注意力结合预训练语音和文本编码器,并在目标问题上联合微调。所得的编码器可用于连续的令牌级别分类或对同时文本和语音进行话语级别的预测,并高效地捕获声学-韵律和词汇信息。
Abstract
In this paper, we propose a novel architecture for
multi-modal
speech
and
text
input. We combine pretrained
→