BriefGPT.xyz
Jan, 2021
随机梯度下降中隐式正则化的起源
On the Origin of Implicit Regularization in Stochastic Gradient Descent
HTML
PDF
Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De
TL;DR
本文研究随机梯度下降(SGD)的学习率对准确性的影响,证明当学习率适当大时,SGD的迭代路径离梯度下降路径更近,这种现象可通过引入一个隐式正则化项进行解释,并通过实验证明在适当的学习率下包含隐式正则化项可以提高测试准确性。
Abstract
For infinitesimal learning rates,
stochastic gradient descent
(SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this
→