BriefGPT.xyz
Sep, 2018
带记忆的稀疏化随机梯度下降
Sparsified SGD with Memory
HTML
PDF
Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi
TL;DR
对于分布式算法,通过对随机梯度下降(SGD)的压缩(如 top-k 或 random-k)等技术进行分析,发现它在进行误差补偿的情况下,能够实现与传统 SGD 相同的收敛速度,降低数据通信量达到更好的分布式可扩展性。
Abstract
Huge scale machine learning problems are nowadays tackled by
distributed optimization
algorithms, i.e. algorithms that leverage the compute power of many devices for training. The
communication
overhead is a key
→