BriefGPT.xyz
Sep, 2022
超级计算环境中分布式深度学习的Top-k梯度稀疏化实证分析
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
HTML
PDF
Daegun Yoon, Sangyoon Oh
TL;DR
该论文研究了使用Top-k SGD方法减少通信流量来提高深度学习模型在多GPU下的训练性能,但是因为在GPU上进行梯度排序效率低下,所以该方法具有局限性,提出未来工作的高性能梯度稀疏化方法。
Abstract
To train
deep learning
models faster,
distributed training
on multiple GPUs is the very popular scheme in recent years. However, the
communicatio
→