高效地融合预训练的声学和语言编码器用于低资源语音识别

Jan, 2021

高效地融合预训练的声学和语言编码器用于低资源语音识别

Fusing Wav2vec2.0 and BERT into End-to-end Model for Low-resource Speech Recognition

Cheng Yi, Shiyu Zhou, Bo Xu

TL;DR该论文研究了如何将预训练声学编码器和预训练语言编码器融合到端到端自动语音识别模型中，以提高模型的性能，尤其是在低资源自动语音识别的情境下。实验证明，该方法比其他端到端模型在15小时的CALLHOME语料库上表现得更好。

Abstract

self-supervised acoustic pre-training has achieved impressive results on low-resource speech recognition tasks. It indicates that the pretrain-and-finetune paradigm is a promising direction. In this work, we prop