BriefGPT.xyz
Nov, 2024
针对大规模分布式训练的自适应共识梯度聚合
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
HTML
PDF
Yoni Choukroun, Shlomi Azoulay, Pavel Kisilev
TL;DR
本研究解决了在有限通信条件下,分布式深度学习中的梯度聚合效率问题。通过将聚合过程视为目标导向的子空间优化问题,提出了一种新的加权方案并引入子空间动量,以加快收敛速度,同时保持聚合的统计无偏性。实验结果表明,该方法在多个机器学习任务上优于传统的梯度平均方法,具有更高的效率。
Abstract
Distributed Machine Learning
has recently become a critical paradigm for training large models on vast datasets. We examine the
Stochastic Optimization
problem for deep learning within synchronous parallel comput
→