TL;DR该论文通过深度编码器和浅层解码器结构、多头注意力修剪以及将解码器自注意力替换为简化的循环单元等方法,可以在不降低翻译质量的情况下,在 CPU 和 GPU 上分别获得高达 109% 和 84% 的加速,并将参数数量减少 25%。
Abstract
Large transformer models have achieved state-of-the-art results in neural
machine translation and have become standard in the field. In this work, we
look for the optimal combination of known techniques to optimi