关于 $K$ 步均值随机梯度下降算法在非凸优化中的收敛性质

Aug, 2017

关于 $K$ 步均值随机梯度下降算法在非凸优化中的收敛性质

On the convergence properties of a $K$-step averaging stochastic gradient descent algorithm for nonconvex optimization

Fan Zhou, Guojing Cong

TL;DR采用同步K步均值随机梯度下降算法，解决机器学习问题，证明K-AVG算法的收敛性，解释为什么需要K步延迟，表明在大规模数据集上，K-AVG算法优于ASGD算法。

Abstract

Despite their popularity, the practical performance of asynchronous stochastic gradient descent methods (\ASGD) for solving large scale machine learning problems are not as good as theoretical results indicate. W