BriefGPT.xyz
Mar, 2018
可分数据上梯度下降的收敛性
Convergence of Gradient Descent on Separable Data
HTML
PDF
Mor Shpigel Nacson, Jason Lee, Suriya Gunasekar, Nathan Srebro, Daniel Soudry
TL;DR
对采用严格单调尾部的损失函数(如对数损失)在可分离数据集上利用梯度下降时的隐式偏差进行了详细研究,证明了对于一大类超多项式尾部损失,梯度下降迭代可以收敛到任意深度的线性网络的L2最大边距解。
Abstract
The
implicit bias
of
gradient descent
is not fully understood even in simple linear classification tasks (e.g., logistic regression). Soudry et al. (2018) studied this bias on separable data, where there are mult
→