使用随机梯度下降法找到稳定点的复杂度

Oct, 2019

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

Yoel Drori, Ohad Shamir

TL;DR研究了随机梯度下降（SGD）算法在最小化光滑、可能非凸函数梯度范数方面的迭代复杂度，结果表明，Ghadimi和Lan的上限不能得到改进，除非做出额外的假设，即使对于凸二次函数，也是如此；此外还表明，对于非凸函数，SGD最小化梯度的可行性需要根据所选择的最优性标准而定。

Abstract

We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions