For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.

该论文提出了局部先验匹配（LPM）作为一种半监督学习目标，通过从强先验（例如语言模型）中提取知识为无标签语音的判别模型提供学习信号来训练语音识别模型，在可比较的实验设置下，证明了LPM在理论上的合理性，实现的简单性以及优于现有知识蒸馏技术。使用LPM方法，相对于在相同数据集上完全有监督模型，通过将100小时已标注语音与额外的360小时未标注数据训练，能够使干净和嘈杂的测试集上的词错误率分别回复到54％和73％。

本地先验匹配的半监督语音识别