BriefGPT.xyz
May, 2018
本地SGD收敛快且通信量小
Local SGD Converges Fast and Communicates Little
HTML
PDF
Sebastian U. Stich
TL;DR
本论文证明了局部随机梯度下降算法在凸问题上能够以与小批量随机梯度下降算法相同的速率收敛,并且与工人数量和小批量大小呈线性加速关系,其中通信轮数可以减少长达T ^ {1/2}个因子。
Abstract
mini-batch stochastic gradient descent
(SGD) is the state of the art in large scale parallel machine learning, but its scalability is limited by a communication bottleneck. Recent work proposed
local sgd
, i.e. ru
→