Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar...
TL;DR本研究旨在加入 Listen, Attend and Spell 模型作为第二步流程,从而将端到端流式模型的性能提高至与传统语音识别系统相当,同时也满足计算和响应延迟等约束条件。
Abstract
The requirements for many applications of state-of-the-art speech recognition
systems include not only low word error rate (WER) but also low latency.
Specifically, for many use-cases, the system must be able to
本篇文章探讨了使用 deliberation network 既关注声学特征又关注第一步文本假说的方法来提高两步叠加模型在 ASR 中性能的优化,经过 Google Voice Search 的比较实验,正确率提高了 12%(相对于 LAS rescoring),在专有名词测试集上,提高了 23%。与传统大模型相比,最好的模型在 VS 上表现要好 21%。