BriefGPT.xyz
Jul, 2023
大学习率训练的不稳定性:一个损失景观视角
The instabilities of large learning rate training: a loss landscape view
HTML
PDF
Lawrence Wang, Stephen Roberts
TL;DR
该研究通过考虑具有较大学习率的网络训练过程中的海森矩阵,研究了损失函数空间,揭示了梯度下降的不稳定性,且观察到了景观平坦化和景观移位的引人注目现象,这两者与训练的不稳定性密切相关。
Abstract
Modern
neural networks
are undeniably successful. Numerous works study how the curvature of
loss landscapes
can affect the quality of solutions. In this work we study the loss landscape by considering the
→