TL;DR研究表明,在没有成对的语音和文本的情况下,可以使用其他语言的字符级声学模型引导新语言的无监督自动语音识别系统,方法基于两个主要组成部分:使用其他语言 AM 生成目标语言的伪标签并用目标语言模型加以约束。
Abstract
Recent work has shown that it is possible to train an $\textit{unsupervised}$
automatic speech recognition (ASR) system using only unpaired audio and text.
Existing unsupervised ASR methods assume that no labeled data can be used for
training. We argue that even if one does not have an