BriefGPT.xyz
Jun, 2023
注意力机制中的边缘最大化
Margin Maximization in Attention Mechanism
HTML
PDF
Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak
TL;DR
本研究探讨了注意力机制作为令牌分离机制的形式,并论证了运行梯度下降收敛于最大边缘解,同时提出了广泛的正则化路径分析。
Abstract
attention mechanism
is a central component of the
transformer architecture
which led to the phenomenal success of large language models. However, the theoretical principles underlying the
→