TL;DR本文研究了使用自监督预训练技术来提高语音识别的准确性,发现在监督学习框架中,利用不同的预训练自监督特征作为Acoustic Word Embeddings的输入是最有效的方法,并且这种方法可实现跨语言迁移。
Abstract
In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings.