ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.

本文使用ChatGPT模型作为例子，研究其在零样本或一次样本设置中执行ASR错误校正的能力，并提出了无约束错误校正和N-best约束错误校正方法。结果表明，使用强大的ChatGPT模型进行错误校正可以大大提高ASR系统性能。

生成式大型语言模型能否执行 ASR 错误校正？