BriefGPT.xyz
Jun, 2020
随机梯度下降中的重尾现象
The Heavy-Tail Phenomenon in SGD
HTML
PDF
Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu
TL;DR
本文阐述了随机梯度下降(SGD)在深度学习中的泛化性能与最小值的浅奥关系,并通过线性回归等简单问题分析证明了参数选择会对算法的收敛率及概率分布产生影响。
Abstract
In recent years, various notions of capacity and complexity have been proposed for characterizing the generalization properties of
stochastic gradient descent
(SGD) in
deep learning
. Some of the popular notions t
→