减少计算负载的快速容错分布式 SGD

Apr, 2023

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

Maximilian Egger, Serge Kas Hanna, Rawad Bitar

TL;DR该研究通过适应节点和运算负载调整方法，提高分布式随机梯度下降算法的收敛速度，显著降低计算负载，但略微增加通信负载。

Abstract

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of opt