BriefGPT.xyz
Jan, 2019
压缩梯度差异的分布式学习
Distributed Learning with Compressed Gradient Differences
HTML
PDF
Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik
TL;DR
本文提出了一种名为 DIANA 的新型分布式学习方法,通过压缩梯度差异解决了模型更新通信瓶颈的问题,并且在强凸和非凸设置中进行了理论分析,结果表明 DIANA 的收敛速度优于现有方法。
Abstract
Training very large machine learning models requires a
distributed computing
approach, with communication of the model updates often being the bottleneck. For this reason, several methods based on the
compression
→