BriefGPT.xyz
Sep, 2020
隐式梯度正则化
Implicit Gradient Regularization
HTML
PDF
David G. T. Barrett, Benoit Dherin
TL;DR
本文研究了梯度下降算法在优化神经网络时的表现,发现梯度下降中的离散步骤隐含地通过惩罚大损失梯度轨迹的方式实现了模型的正则化,这种“隐性梯度正则化”导致梯度下降趋向于平坦的最小值,使解决方案对噪声参数扰动有很好的鲁棒性,这一理论有助于解决过拟合问题。
Abstract
gradient descent
can be surprisingly good at optimizing deep
neural networks
without overfitting and without explicit regularization. We find that the discrete steps of
→