BriefGPT.xyz
Oct, 2024
朝向更好的泛化:权重衰减引导神经网络低秩偏差
Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks
HTML
PDF
Ke Chen, Chugang Yi, Haizhao Yang
TL;DR
本研究解决了使用权重衰减(WD)训练神经网络时低秩权重矩阵的隐式偏差问题。我们证明了ReLU神经网络经过充分训练后,权重矩阵近似为秩为二的矩阵。通过实证研究,我们展示了WD是在回归和分类任务中引导这一低秩偏差的必要条件,并提供了改进的泛化误差边界。
Abstract
We study the implicit bias towards low-rank weight matrices when training
Neural Networks
(NN) with
Weight Decay
(WD). We prove that when a ReLU NN is sufficiently trained with
→