TL;DR本文提出了一种基于自适应学习的混合蒸馏算法,用于进一步提高 BANG 生成质量。实验证明该方法有效性,并且不会影响推理延迟,相比 BANG 可以显著提高 BLEU 分数,在自回归生成方法方面还可获得超过7倍的加速。
Abstract
non-autoregressive generation is a sequence generation paradigm, which removes the dependency between target tokens. It could efficiently reduce the text generation latency with parallel decoding in place of token-by-token sequential decoding. However, due to the known multi-modality p