BriefGPT.xyz
Jun, 2024
使用Delta规则并行化线性变换器来处理序列长度
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
HTML
PDF
Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim
TL;DR
用Delta规则训练线性变压器,并结合滑动窗口和全局注意力层构建的混合模型,在语言建模和下游任务中表现出色。
Abstract
transformers
with
linear attention
(i.e., linear
transformers
) and
stat
→