BriefGPT.xyz
Feb, 2020
使用逻辑损失训练的宽两层神经网络的梯度下降的隐含偏见
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
HTML
PDF
Lenaic Chizat, Francis Bach
TL;DR
分析了具有同质性激活函数的两层神经网络在无限宽的情况下的训练和泛化行为,并表明在存在低维结构的情况下,梯度流的极限可以完全表征为某些函数空间中的最大间隔分类器,并且具有强的泛化边界,在实践中符合两层神经网络的行为,并证明了其隐式偏差的统计优点。
Abstract
neural networks
trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many
supervised classification
tasks. Towards understanding this phenomenon
→