BriefGPT.xyz
May, 2019
神经网络上的随机梯度下降学习越来越复杂的函数
SGD on Neural Networks Learns Functions of Increasing Complexity
HTML
PDF
Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman...
TL;DR
实验研究表明,Stochastic Gradient Descent利用条件互信息学习了从线性分类器到逐渐复杂的函数的分类器,解释了超参数化区域中SGD学习的分类器为什么往往具有良好的泛化能力。
Abstract
We perform an experimental study of the dynamics of
stochastic gradient descent
(SGD) in learning deep
neural networks
for several real and synthetic classification tasks. We show that in the initial epochs, almo
→