Voice assistants increasingly use on-device Automatic Speech Recognition (ASR) to ensure speed and privacy. However, due to resource constraints on the device, queries pertaining to complex information domains often require further processing by a search engine. For such applications, we propose a novel Transformer based model capable of rescoring and rewriting, by exploring full context of the N-best hypotheses in parallel. We also propose a new discriminative sequence training objective that can work well for both rescore and rewrite tasks. We show that our Rescore+Rewrite model outperforms the Rescore-only baseline, and achieves up to an average 8.6% relative Word Error Rate (WER) reduction over the ASR system by itself.

声助助手越来越多地使用设备上的自动语音识别（ASR）以确保速度和隐私。然而，由于设备上的资源限制，涉及复杂信息领域的查询通常需要搜索引擎进一步处理。针对这种应用，我们提出了一种新颖的基于Transformer模型的能够通过并行地探索N个最佳假设的完整上下文来重新评分和重写的模型。我们还提出了一种新的区分性序列训练目标，可在重新评分和重写任务中都能良好地工作。我们表明，我们的“重新评分+重写”模型优于仅重新评分的基准模型，并且相对于仅ASR系统本身，词错误率（WER）平均降低了高达8.6%。

基于Transformer的语音识别N-Best重新评分和重写模型