BriefGPT.xyz
May, 2018
通过平均注意力网络加速神经变换器
Accelerating Neural Transformer via an Average Attention Network
HTML
PDF
Biao Zhang, Deyi Xiong, Jinsong Su
TL;DR
通过使用平均注意力网络作为神经Transformers解码器中的替代自注意力网络来解决因自注意力网络在解码器中导致的解码缓慢的问题,实现更快速的句子解码,从而提高翻译任务的速度和性能。
Abstract
With
parallelizable attention networks
, the
neural transformer
is very fast to train. However, due to the
auto-regressive architecture
and
→