BriefGPT.xyz
Nov, 2022
凸化Transformer:改进Transformer网络的优化和理解
Convexifying Transformers: Improving optimization and understanding of transformer networks
HTML
PDF
Tolga Ergen, Behnam Neyshabur, Harsh Mehta
TL;DR
论文研究了Transformer网络模型训练的问题,并提出了一种新的凸分析方法来解决这个问题,进而提供了这些网络模型的理论解释以及性能优化方法。
Abstract
Understanding the fundamental mechanism behind the success of
transformer networks
is still an open problem in the
deep learning
literature. Although their remarkable performance has been mostly attributed to the
→