BriefGPT.xyz
Oct, 2021
两层神经网络的梯度下降: 边界最大化和简化偏差
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
HTML
PDF
Kaifeng Lyu, Zhiyuan Li, Runzhe Wang, Sanjeev Arora
TL;DR
本文研究了Leaky ReLU神经网络的全局最优性,证明了线性可分对称数据上的梯度流算法能够收敛于全局最优的“max-margin”解,同时还对梯度下降在训练初期的“简单度偏向”现象进行了理论解释。
Abstract
The generalization mystery of
overparametrized deep nets
has motivated efforts to understand how
gradient descent
(GD) converges to low-loss solutions that generalize well. Real-life neural networks are initializ
→