BriefGPT.xyz
Jun, 2019
梯度噪声卷积(GNC):用于分布式大批量SGD的平滑损失函数
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD
HTML
PDF
Kosuke Haruki, Taiji Suzuki, Yohei Hamakawa, Takeshi Toda, Ryuji Sakai...
TL;DR
采用梯度噪声卷积方法可解决分布式深度学习中大批量随机梯度下降的欠拟合和锐利极小值的问题,通过梯度噪声的卷积,该方法能够更有效地平滑锐利的极小值,提高模型的泛化性能。
Abstract
Large-batch
stochastic gradient descent
(SGD) is widely used for training in distributed
deep learning
because of its training-time efficiency, however, extremely large-batch SGD leads to poor
→