BriefGPT.xyz
Sep, 2018
稀疏梯度下降法的收敛性
The Convergence of Sparsified Gradient Methods
HTML
PDF
Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov...
TL;DR
本文研究了基于梯度稀疏化的分布式深度神经网络的训练方法,证明了在一定的解析条件下,采用基于梯度幅值优先选择梯度部分更新方法具有收敛性,并验证了该方法的有效性并探究了其收敛条件。
Abstract
distributed training
of massive machine learning models, in particular deep neural networks, via
stochastic gradient descent
(SGD) is becoming commonplace. Several families of communication-reduction methods, suc
→