non-autoregressive transformer (NAT) is a family of text generation models,
which aims to reduce the decoding latency by predicting the whole sentences in
parallel. However, such latency reduction sacrifices the
提出了一种基于 Transformer 的、有词汇感知的自动语音识别框架,可以同时训练语音和文本数据,并松弛条件独立性,实现更快的解码速度和较好的性能。实验结果表明,该模型比其他最近提出的非自回归 ASR 模型更具优越性,并且比大多数非自回归 ASR 模型更为简洁,解码速度是经典自回归模型的 58 倍。