BriefGPT.xyz
Dec, 2017
深度梯度压缩:降低分布式训练的通信带宽
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
HTML
PDF
Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally
TL;DR
本文提出深度梯度压缩(DGC),通过动量修正、局部梯度截断、动量因子掩模和预热训练,使得分布式SGD中99.9%的梯度交换变得不重要,从而大大减少通信带宽需求,有效保持模型准确率,支持在1Gbps以太网和移动设备上进行大规模分布式训练。
Abstract
Large-scale
distributed training
requires significant
communication bandwidth
for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure
→