TL;DR本论文提出了一种Variance Reduced Local SGD算法,通过消除工作人员之间的梯度方差依赖性,实现了更低的通信复杂性,以实现线性迭代加速,并在三个机器学习任务上得到了卓越的性能表现。
Abstract
To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among