BriefGPT.xyz
Feb, 2021
随机梯度下降中小批量噪声的强度
On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes
HTML
PDF
Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
TL;DR
分析随机梯度下降中,小批量抽样引起的噪声和波动,揭示了大学习率可以通过引入隐含的正则化来帮助泛化的内在规律,并且可以提供一种理解随机梯度下降离散时序性对其功率规律现象的影响。
Abstract
The noise in
stochastic gradient descent
(SGD), caused by
minibatch sampling
, remains poorly understood despite its enormous practical importance in offering good training efficiency and generalization ability. I
→