BriefGPT.xyz
Feb, 2022
线性时间中的Transformer质量
Transformer Quality in Linear Time
HTML
PDF
Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le
TL;DR
本文提出使用门控注意力单元和线性近似方法改良Transformers模型的方法,新模型命名为FLASH。该模型在短序列和长序列上都达到了改良Transformers的分词结果,同时在Wiki-40B和PG-19的自回归语言模型上训练速度最多提升了4.9倍,在掩蔽语言模型上提升了4.8倍。
Abstract
We revisit the design choices in
transformers
, and propose methods to address their weaknesses in handling long sequences. First, we propose a simple layer named
gated attention unit
, which allows the use of a we
→