训练更快，泛化更好: 随机梯度下降的稳定性

Sep, 2015

训练更快，泛化更好: 随机梯度下降的稳定性

Train faster, generalize better: Stability of stochastic gradient descent

Moritz Hardt, Benjamin Recht, Yoram Singer

TL;DR本文证明使用随机梯度方法训练的参数模型少迭代次数即可实现消失的泛化误差，提供了新的对于随机梯度方法多周期泛化性能好的解释，对于神经网络的训练也有新的稳定性解释。

Abstract

We show that any model trained by a stochastic gradient method with few iterations has vanishing generalization error. We prove this by showing the method is algorithmically stable in the sense of Bousquet and El