BriefGPT.xyz
Jun, 2021
标签噪声SGD可证明偏爱扁平化全局最小值
Label Noise SGD Provably Prefers Flat Global Minimizers
HTML
PDF
Alex Damian, Tengyu Ma, Jason Lee
TL;DR
研究过参数化模型,标签噪音等对随机梯度下降中的正则化作用及其影响。
Abstract
In
overparametrized models
, the noise in
stochastic gradient descent
(SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies th
→