BriefGPT.xyz
Sep, 2015
训练更快,泛化更好: 随机梯度下降的稳定性
Train faster, generalize better: Stability of stochastic gradient descent
HTML
PDF
Moritz Hardt, Benjamin Recht, Yoram Singer
TL;DR
本文证明使用随机梯度方法训练的参数模型少迭代次数即可实现消失的泛化误差,提供了新的对于随机梯度方法多周期泛化性能好的解释,对于神经网络的训练也有新的稳定性解释。
Abstract
We show that any model trained by a
stochastic gradient method
with few iterations has vanishing
generalization error
. We prove this by showing the method is algorithmically stable in the sense of Bousquet and El
→