BriefGPT.xyz
May, 2024
邻近领导去中心化随机梯度下降
Adjacent Leader Decentralized Stochastic Gradient Descent
HTML
PDF
Haoze He, Jing Wang, Anna Choromanska
TL;DR
提出一种名为AL-DSGD的邻近领导者分散梯度下降方法,通过分配权重和动态通信图,在分散式深度学习优化中加快收敛速度、降低通信开销,改善了最先进技术的测试性能。
Abstract
This work focuses on the
decentralized deep learning optimization
framework. We propose
adjacent leader decentralized gradient descent
(AL-DSGD), for improving final model performance, accelerating convergence, a
→