BriefGPT.xyz
Aug, 2017
关于 $K$ 步均值随机梯度下降算法在非凸优化中的收敛性质
On the convergence properties of a $K$-step averaging stochastic gradient descent algorithm for nonconvex optimization
HTML
PDF
Fan Zhou, Guojing Cong
TL;DR
采用同步K步均值随机梯度下降算法,解决机器学习问题,证明K-AVG算法的收敛性,解释为什么需要K步延迟,表明在大规模数据集上,K-AVG算法优于ASGD算法。
Abstract
Despite their popularity, the practical performance of asynchronous
stochastic gradient descent
methods (\ASGD) for solving large scale
machine learning
problems are not as good as theoretical results indicate. W
→