分离数据梯度下降的隐式偏差

Oct, 2017

The Implicit Bias of Gradient Descent on Separable Data

Daniel Soudry, Elad Hoffer, Nathan Srebro

TL;DR本研究发现，在无正则化的逻辑回归问题、线性可分数据集上，使用均匀线性预测器的梯度下降法会收敛于最大间隔解的方向。收敛速度缓慢，方法适用于其他单调递减的损失函数、多类别问题和某些受限情况下的深层网络训练。此研究还可帮助理解模型的隐式正则化和其他优化方法。

Abstract

We show that gradient descent on an unregularized logistic regression problem with separable data converges to the max-margin solution. Th