BriefGPT.xyz
Nov, 2018
关于非凸过参数化学习中SGD的指数收敛
On exponential convergence of SGD in non-convex over-parametrized learning
HTML
PDF
Raef Bassily, Mikhail Belkin, Siyuan Ma
TL;DR
该文研究了使用随机梯度下降方法学习的大型过度参数化模型的收敛速度,并证明了当损失函数为凸函数或满足Polyak-Lojasiewicz条件的广泛非凸函数类时,常数步长下 SGD 可以实现指数收敛。
Abstract
Large
over-parametrized models
learned via
stochastic gradient descent
(SGD) methods have become a key element in modern machine learning. Although SGD methods are very effective in practice, most theoretical ana
→