关于非凸过参数化学习中SGD的指数收敛

Nov, 2018

关于非凸过参数化学习中SGD的指数收敛

On exponential convergence of SGD in non-convex over-parametrized learning

Raef Bassily, Mikhail Belkin, Siyuan Ma

TL;DR该文研究了使用随机梯度下降方法学习的大型过度参数化模型的收敛速度，并证明了当损失函数为凸函数或满足Polyak-Lojasiewicz条件的广泛非凸函数类时，常数步长下 SGD 可以实现指数收敛。

Abstract

Large over-parametrized models learned via stochastic gradient descent (SGD) methods have become a key element in modern machine learning. Although SGD methods are very effective in practice, most theoretical ana