形状对噪声协方差隐式偏差的影响

Jun, 2020

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

TL;DR本文中，我们理论上证明了随机梯度下降法（SGD）中参数相关噪声（由小批量或标签扰动引起）比高斯噪声更加有效，并且具有对训练过度参数化模型的重要隐式正则化效应。

Abstract

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Pr