BriefGPT.xyz
Sep, 2022
自稳定性: 梯度下降在稳定边缘的隐性偏差
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
HTML
PDF
Alex Damian, Eshaan Nichani, Jason D. Lee
TL;DR
本研究发现梯度下降在稳定边缘状态下具有自我稳定性和隐式偏差,可以通过投影梯度下降来描述,并对其在训练过程中的损失、尖锐度和偏差进行了详细预测和验证。
Abstract
Traditional analyses of
gradient descent
show that when the largest eigenvalue of the
hessian
, also known as the sharpness $S(\theta)$, is bounded by $2/\eta$, training is "stable" and the training loss decreases
→