Stochastic gradient methods have increasingly become popular for large-scale optimization. However, they are often numerically unstable because of their sensitivity to hyperparameters in the learning rate; furthermore they are statistically inefficient because of their suboptimal usage of the data's information. We propose a new learning procedure, termed averaged implicit stochastic gradient descent (ai-SGD), which combines stability through proximal (implicit) updates and statistical efficiency through averaging of the iterates. In an asymptotic analysis we prove convergence of the procedure and show that it is statistically optimal, i.e., it achieves the Cramer-Rao lower variance bound. In a non-asymptotic analysis, we show that the stability of ai-SGD is due to its robustness to misspecifications of the learning rate with respect to the convexity of the loss function. Our experiments demonstrate that ai-SGD performs on par with state-of-the-art learning methods. Moreover, ai-SGD is more stable than averaging methods that do not utilize proximal updates, and it is simpler and computationally more efficient than methods that do employ proximal updates in an incremental fashion.

提出一种基于平均隐式（averaged implicit）随机梯度下降的迭代过程，旨在解决参数估计过程中的数值不稳定性和统计效率问题。实践证明这种方法比其他现有方法表现更为出色。

随机梯度下降的稳定性和最优性