（随机）梯度方法的统一最优分析

Jul, 2019

（随机）梯度方法的统一最优分析

Unified Optimal Analysis of the (Stochastic) Gradient Method

Sebastian U. Stich

TL;DR证明在L-平滑度条件下, 随机梯度下降的迭代收敛速度的数量级为O(LR2exp[-(mu/4L)T]+sigma2/muT),其中sigma2是随机噪声方差, 且收敛速度与最佳已知的GD和SGD迭代复杂度匹配.

Abstract

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $\mu$-strongly convex functions under a (milder than standard) $L$-smoothness assumption. We show that SGD converges after $T$ iterations as $O\left( L \|x_0-x^\star\|^2 \exp \bigl[-\frac{\m