BriefGPT.xyz
Oct, 2018
梯度下降证明过参数化神经网络的最优化
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
HTML
PDF
Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh
TL;DR
本文研究表明,在神经网络中使用ReLU激活函数和随机初始化梯度下降法可以以全局线性收敛率收敛于全局最优解,其分析依赖于神经网络的超参数和随机初始化方式,这些经验也可能有助于分析深度网络等其他一阶方法。
Abstract
One of the mystery in the success of
neural networks
is randomly initialized first order methods like
gradient descent
can achieve zero training loss even though the objective function is non-convex and non-smoot
→