BriefGPT.xyz
May, 2023
利用损失函数的二阶信息加速收敛的本地随机梯度下降
Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function
HTML
PDF
Linxuan Pan, Shenghui Song
TL;DR
该论文通过理论分析和实验证明,本地统计梯度下降(L-SGD)可以更有效地探索损失函数的二阶信息,从而比随机梯度下降(SGD)更快地收敛。
Abstract
With multiple iterations of updates,
local statistical gradient descent
(L-SGD) has been proven to be very effective in
distributed machine learning
schemes such as
→