本地SGD收敛快且通信量小

May, 2018

Local SGD Converges Fast and Communicates Little

Sebastian U. Stich

TL;DR本论文证明了局部随机梯度下降算法在凸问题上能够以与小批量随机梯度下降算法相同的速率收敛，并且与工人数量和小批量大小呈线性加速关系，其中通信轮数可以减少长达T ^ {1/2}个因子。

Abstract

mini-batch stochastic gradient descent (SGD) is the state of the art in large scale parallel machine learning, but its scalability is limited by a communication bottleneck. Recent work proposed local sgd, i.e. ru