BriefGPT.xyz
Jan, 2019
通过星型凸路径,SGD在深度学习中收敛到全局最小值
SGD Converges to Global Minimum in Deep Learning via Star-convex Path
HTML
PDF
Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh
TL;DR
本研究证明了随机梯度下降法 (SGD)可训练深度神经网络,甚至可以收敛于全局最小值。这一结果得益于多个实验验证了SGD可以遵循恒星凸轨迹和训练损失近似于零值等性质,并以新方式揭示了SGD以确定性方式收敛于全局最小值。
Abstract
stochastic gradient descent
(SGD) has been found to be surprisingly effective in training a variety of
deep neural networks
. However, there is still a lack of understanding on how and why SGD can train these comp
→