BriefGPT.xyz
Jun, 2023
随机坍塌: 梯度噪声如何将 SGD 动态吸引到更简单的子网络
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
HTML
PDF
Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli
TL;DR
本研究揭示了SGD存在的强烈隐式偏差,由此驱使过度表达的神经网络倾向于变得更简单,从而显著减少独立参数数量,并改进了泛化能力。
Abstract
In this work, we reveal a strong implicit bias of
stochastic gradient descent
(SGD) that drives overly
expressive networks
to much
simpler subnet
→