BriefGPT.xyz
Dec, 2017
AdaComp:自适应残差梯度压缩用于数据并行分布式训练
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
HTML
PDF
Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang...
TL;DR
本文提出了自适应残差梯度压缩(AdaComp)技术,能够在多个领域、数据集、优化器和网络参数上显著地提高深度学习模型的性能,实现全连接层和循环层的端到端压缩率约200倍,卷积层的压缩率约40倍。
Abstract
Highly distributed training of Deep
neural networks
(
dnns
) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this
→