BriefGPT.xyz
Oct, 2021
小批量 SGD 与局部 SGD 洗牌:紧密收敛界与进一步研究
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond
HTML
PDF
Chulhee Yun, Shashank Rajput, Suvrit Sra
TL;DR
研究分布式学习中的本地 SGD 和基于随机梯度的优化方法,通过随机梯度下降的方案,降低了随机抽样带来的估计偏差和方差,提高了模型的训练效率,实验表明,该方案的效果比替代方案更好。
Abstract
In
distributed learning
,
local sgd
(also known as
federated averaging
) and its simple baseline minibatch SGD are widely studied optimizati
→