BriefGPT.xyz
Feb, 2024
可证明学习多头注意力层
Provably learning a multi-head attention layer
HTML
PDF
Sitan Chen, Yuanzhi Li
TL;DR
从随机示例中学习多头注意力层的算法,给出了该问题的首个非平凡上下界。
Abstract
The
multi-head attention layer
is one of the key components of the
transformer architecture
that sets it apart from traditional feed-forward models. Given a sequence length $k$, attention matrices $\mathbf{\Theta
→