BriefGPT.xyz
Jun, 2023
深度矩阵分解中平坦正则化的归纳偏差
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
HTML
PDF
Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma...
TL;DR
在学习从线性度量中的深度线性网络时,最小化Hessian矩阵的迹大致相当于最小化相应端到端矩阵参数的Schatten 1-范数,这进而导致更好的概括。
Abstract
Recent works on over-parameterized
neural networks
have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its
→