BriefGPT.xyz
Sep, 2021
随机训练对于泛化不是必需的
Stochastic Training is Not Necessary for Generalization
HTML
PDF
Jonas Geiping, Micah Goldblum, Phillip E. Pope, Michael Moeller, Tom Goldstein
TL;DR
本文通过比较全批量训练和SGD在现代结构下在CIFAR-10数据集上的表现,证明了SGD的隐式正则化可以完全被显式正则化替代,并指出全批量训练受限于优化性质和ML社区花费在小批量训练上的大量时间和精力。
Abstract
It is widely believed that the
implicit regularization
of stochastic gradient descent (
sgd
) is fundamental to the impressive generalization behavior we observe in
→