Independence assumptions during sequence generation can speed up inference, but parallel generation of highly inter-dependent tokens comes at a cost in quality. Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder by simply interleaving the two directions and adapting the word positions and self-attention masks. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer, and on five machine translation tasks and two document summarization tasks, achieves a decoding speedup of ~2X compared to autoregressive decoding with comparable quality. Notably, it outperforms left-to-right SA because the independence assumptions in IBDecoder are more felicitous. To achieve even higher speedups, we explore hybrid models where we either simultaneously predict multiple neighbouring tokens per direction, or perform multi-directional decoding by partitioning the target sequence. These methods achieve speedups to 4X-11X across different tasks at the cost of <1 BLEU or <0.5 ROUGE (on average). Source code is released at https://github.com/bzhangGo/zero.

本文提出了一种交叉双向的解码器，它能同时进行从左到右和从右到左的目标词语生成，并且相较于自回归解码等传统解码方式在保持质量的情况下能实现达到2倍的解码速度提升。此外，作者还探索了同时预测多个相邻token的方法，以及根据目标序列进行分区解码的多向模型，这些方法也能够在不损失太多准确度的情况下进一步提高解码速度。

快速交替双向序列生成