流式混淆网络语音识别

Jun, 2023

Streaming Speech-to-Confusion Network Speech Recognition

Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke

TL;DR本文提出了一种新型流式自动语音识别架构，可输出混淆网络并保持有限的延迟，以满足交互式应用的需要，其1-best结果与可比较的RNN-T系统相当，而更丰富的假设集允许进行第二遍重评分，以在LibriSpeech任务上实现10-20％更低的字词误差率，同时在远场语音助手任务中优于强RNN-T基线。

Abstract

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural asr. In this paper, we