BriefGPT.xyz
Feb, 2024
大学习速率下梯度下降的稳定性
On the Stability of Gradient Descent for Large Learning Rate
HTML
PDF
Alexandru Crăciun, Debarghya Ghoshdastidar
TL;DR
在本文中,我们证明了在使用二次损失函数优化的线性神经网络中,梯度下降映射是非奇异的,损失函数的全局极小化集合形成平滑流形,并且稳定的极小值在参数空间中形成有界子集。另外,我们证明了如果步长过大,则使梯度下降收敛到临界点的初始化集合的测度为零。
Abstract
There currently is a significant interest in understanding the
edge of stability
(EoS) phenomenon, which has been observed in
neural networks training
, characterized by a non-monotonic decrease of the
→