BriefGPT.xyz
Sep, 2023
基于锐度感知的最小化与稳定边缘
Sharpness-Aware Minimization and the Edge of Stability
HTML
PDF
Philip M. Long, Peter L. Bartlett
TL;DR
最近的实验证明,使用梯度下降的神经网络在损失的Hessian算子范数增长到约等于2/步长η后,就开始在该值周围波动。我们对Sharpness-Aware Minimization(SAM)进行了类似的计算,得到了一个基于梯度范数的稳定边缘。通过三个深度学习训练任务的经验验证,我们发现SAM在该分析所确定的稳定边缘操作。
Abstract
Recent experiments have shown that, often, when training a
neural network
with
gradient descent
(GD) with a step size $\eta$, the operator norm of the
→