BriefGPT.xyz
Jun, 2023
去中心化SGD和平均方向SAM在渐近情况下等价
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
HTML
PDF
Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao
TL;DR
本文挑战了常规信念,提出了一种完全新的角度来理解分散学习,证明了分散随机梯度下降隐含地最小化了一种平均方向锐度感知最小化算法的损失函数,在常规非凸非 $/beta/$ -平滑设置下的这种惊人的渐近等价关系揭示了一种本质上的正则化-优化权衡和分散的三个优点。
Abstract
decentralized stochastic gradient descent
(D-SGD) allows
collaborative learning
on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization in
→