BriefGPT.xyz
Jun, 2020
梯度方法在可分数据上永不过拟
Gradient Methods Never Overfit On Separable Data
HTML
PDF
Ohad Shamir
TL;DR
本文论述了使用梯度方法和指数损失训练线性预测器时,预测器的收敛方向渐近地趋向于最大边缘预测器,但无论迭代次数有多大,标准梯度方法(特别是梯度流、梯度下降、随机梯度下降)永远不会过拟合可分数据集。
Abstract
A line of recent works established that when training
linear predictors
over separable data, using
gradient methods
and exponentially-tailed losses, the predictors asymptotically converge in direction to the max-
→