Matias D. Cattaneo, Jason M. Klusowski, Boris Shigida
TL;DR前人的研究表明,通过反向误差分析可以找到逼近梯度下降轨迹的常微分方程(ODEs)。本文证明 RMSProp 和 Adam 中存在类似的隐式正则化现象,取决于超参数和训练阶段,并与之前的研究有所不同。我们还进行了数值实验,并讨论了这些事实如何影响泛化能力。
Abstract
In previous literature, backward error analysis was used to find ordinary
differential equations (ODEs) approximating the gradient descent trajectory. It
was found that finite step sizes implicitly regularize solutions because terms
appearing in the ODEs penalize the two-norm of the lo