BriefGPT.xyz
Feb, 2018
一个替代观点:随机梯度下降在何时逃离局部极小值?
An Alternative View: When Does SGD Escape Local Minima?
HTML
PDF
Robert Kleinberg, Yuanzhi Li, Yang Yuan
TL;DR
本文研究证明随机梯度下降算法可以在一些非凸函数下工作,这说明了为什么SGD在神经网络中工作得非常好。
Abstract
stochastic gradient descent
(SGD) is widely used in machine learning. Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern
neura
→