Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.

本文提出了 FastCorrect 2，一种错误校正模型，并采用多候选项作为输入，以提高纠错精度。 FastCorrect 2 采用非自回归生成来进行快速推理，其中包括一个编码器，用于处理多个源语句，并且根据每个源标记的预测持续时间生成调整后的源语句；同时，提出了一种新的对齐算法和候选预测器，以最大程度地实现多个句子在令牌和发音相似性方面的令牌对齐，并检测适合解码器的最合适的候选项。实验结果表明，FastCorrect 2 可以减少单个候选者的 WER，比联级重评分和校正流水线更有效，并可用作 ASR 的统一后处理模块。

FastCorrect 2：自动语音识别中的多候选快速纠错