BriefGPT.xyz
May, 2023
将Adam推广到流形上以高效训练Transformers
Generalizing Adam To Manifolds For Efficiently Training Transformers
HTML
PDF
Benedikt Brantner
TL;DR
通过利用特殊结构(如Stiefel流形、simplectic Stiefel流形、Grassmann流形和simplectic Grassmann流形)对神经网络优化进行降维处理,成功地将Adam算法推广到了流形层面上,并将其用于训练转换器,可以有效地加速训练过程。
Abstract
One of the primary reasons behind the success of
neural networks
has been the emergence of an array of new, highly-successful optimizers, perhaps most importantly the
adam optimizer
. It is wiedely used for traini
→