Autoregressive sequence models achieve state-of-the-art performance in
domains like machine translation. However, due to the autoregressive
factorization nature, these models suffer from heavy latency during inference.
Recently, non-autoregressive sequence models were proposed to reduc