BriefGPT.xyz
May, 2023
SANE:通过锐度调整的有效参数数量优化的梯度下降阶段
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
HTML
PDF
Lawrence Wang, Stephen J. Roberts
TL;DR
本文研究神经网络的Hessian矩阵在训练过程中的应用,提出了SANE用于模型比较,并探究了大学习率下Hessian矩阵的偏移及其对深度神经网络的影响。
Abstract
Modern
neural networks
are undeniably successful. Numerous studies have investigated how the curvature of loss landscapes can affect the quality of solutions. In this work we consider the
hessian matrix
during ne
→