BriefGPT.xyz
May, 2024
稳定性边界训练的原因——分层雅可比对齐
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
HTML
PDF
Mark Lowell, Catharine Kastner
TL;DR
用指数欧拉求解器训练神经网络,以准确近似真实的梯度下降动态系统,证明了Hessian矩阵的锐度增加是由于网络的逐层Jacobian矩阵对齐导致的,而对齐程度与数据集大小呈幂律关系,相关性系数在0.74到0.98之间。
Abstract
During
neural network training
, the sharpness of the
hessian matrix
of the training loss rises until training is on the edge of stability. As a result, even nonstochastic
→