BriefGPT.xyz
May, 2024
注意力作为一个RNN
Attention as an RNN
HTML
PDF
Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio...
TL;DR
Transformers在序列建模中取得了重大突破,但计算开销较大,本文提出了一种新的高效计算attention的方法,引入了名为Aaren的attention-based模块,使其能够像Transformers一样并行训练,同时像传统的RNN一样高效地更新新的tokens,从而在多个序列问题上取得了可比较的性能,同时具有更高的时间和内存效率。
Abstract
The advent of
transformers
marked a significant breakthrough in
sequence modelling
, providing a highly performant architecture capable of leveraging GPU parallelism. However,
→