The classical convergence analysis of SGD is carried out under the assumption
that the norm of the stochastic gradient is uniformly bounded. While this might
hold for some loss functions, it is violated for cases where the objective
function is strongly convex. In Bottou et al. (2018), a new analysis of
convergence of SGD is performed under the assumption that stochastic gradients
are bounded with respect to the true gradient norm. We show that for stochastic
problems arising in machine learning such bound always holds; and we also
propose an alternative convergence analysis of SGD with diminishing learning
rate regime. We then move on to the asynchronous parallel setting, and prove
convergence of Hogwild! algorithm in the same regime in the case of diminished
learning rate. It is well-known that SGD converges if a sequence of learning
rates $\{\eta_t\}$ satisfies $\sum_{t=0}^\infty \eta_t \rightarrow \infty$ and
$\sum_{t=0}^\infty \eta^2_t < \infty$. We show the convergence of SGD for
strongly convex objective function without using bounded gradient assumption
when $\{\eta_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty \eta_t
\rightarrow \infty$. In other words, we extend the current state-of-the-art
class of learning rates satisfying the convergence of SGD.

本文对随机梯度下降法（SGD）的收敛性进行了分析，提出了一种新的假设随机梯度较真实梯度的范数更小的分析方法，并在多个情境下证明了 SGD 的收敛性，拓展了当前一类可达到收敛性的学习率。

随机梯度算法的新收敛性特点

New Convergence Aspects of Stochastic Gradient Algorithms

Stochastic gradient descent (SGD) is the optimization algorithm of choice in
many machine learning applications such as regularized empirical risk
minimization and training deep neural networks. The classical convergence
analysis of SGD is carried out under the assumption that the norm of the
stochastic gradient is uniformly bounded. While this might hold for some loss
functions, it is always violated for cases where the objective function is
strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD
is performed under the assumption that stochastic gradients are bounded with
respect to the true gradient norm. Here we show that for stochastic problems
arising in machine learning such bound always holds; and we also propose an
alternative convergence analysis of SGD with diminishing learning rate regime,
which results in more relaxed conditions than those in (Bottou et al.,2016). We
then move on the asynchronous parallel setting, and prove convergence of
Hogwild! algorithm in the same regime, obtaining the first convergence results
for this method in the case of diminished learning rate.

该研究论文讨论了随机梯度下降算法的收敛性分析，提出了一种在异步并行环境下使用降低学习率机制的算法，并证明了其收敛性。