BriefGPT.xyz
Dec, 2022
Transformer层的神经ODE解释
A Neural ODE Interpretation of Transformer Layers
HTML
PDF
Yaofeng Desmond Zhong, Tongtao Zhang, Amit Chakraborty, Biswadip Dey
TL;DR
本文提出了一种修改Transformer层内部结构的方法,将多头注意力子层和MLP子层并行布置,并且结合使用神经ODE求解器的高级积分方案,提高了Transformer网络在多个任务中的性能。
Abstract
transformer layers
, which use an alternating pattern of
multi-head attention
and multi-layer perceptron (
mlp
) layers, provide an effective
→