BriefGPT.xyz
Jan, 2022
分布式深度学习的近似最优稀疏全约简算法
Near-Optimal Sparse Allreduce for Distributed Deep Learning
HTML
PDF
Shigang Li, Torsten Hoefler
TL;DR
本文提出了O$k$-Top$k$的方案,将新型稀疏同时求和算法与去中心化并行随机梯度下降(SGD)optimizer进行集成,达到与总结所有技术相当的模型精度,与优化密集型和最先进的稀疏同时求和相比,O$k$-Top$k$更具扩展性并显着提高了训练吞吐量。
Abstract
Communication overhead is one of the major obstacles to train large
deep learning
models at scale.
gradient sparsification
is a promising technique to reduce the communication volume. However, it is very challeng
→