BriefGPT.xyz
Jun, 2020
具有状态相关噪声的随机梯度下降动态
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
HTML
PDF
Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
TL;DR
通过研究具有状态相关噪声的随机梯度下降的动态行为,我们证明了幂律动态可以比之前的动态更快地从锐化极小值中逃脱,从而提出了一种新方法来进一步提高其概括能力。
Abstract
stochastic gradient descent
(SGD) and its variants are mainstream methods to train deep
neural networks
. Since
neural networks
are non-con
→