稳定性边界训练的原因——分层雅可比对齐

May, 2024

稳定性边界训练的原因——分层雅可比对齐

Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment

Mark Lowell, Catharine Kastner

TL;DR用指数欧拉求解器训练神经网络，以准确近似真实的梯度下降动态系统，证明了Hessian矩阵的锐度增加是由于网络的逐层Jacobian矩阵对齐导致的，而对齐程度与数据集大小呈幂律关系，相关性系数在0.74到0.98之间。

Abstract

During neural network training, the sharpness of the hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic →