BriefGPT.xyz
Apr, 2024
任意$p$范数的分离权重衰减
Decoupled Weight Decay for Any $p$ Norm
HTML
PDF
Nadav Joseph Outmezguine, Noam Levi
TL;DR
通过桥接$L_p$正则化的权重衰减方案,提出了一个简单而有效的稀疏化方法,以应对大规模神经网络的计算和存储要求,并避免了$0<p<1$的正则化梯度发散问题,实验证明其导致高度稀疏的网络,同时保持与标准$L_2$正则化相当的泛化性能。
Abstract
With the success of
deep neural networks
(NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements.
sparsification
→