BriefGPT.xyz
Nov, 2020
理解和调度权值衰减
Stable Weight Decay Regularization
HTML
PDF
Zeke Xie, Issei Sato, Masashi Sugiyama
TL;DR
本文从学习动态的角度提出了weight decay的一种新的理论解释,针对大批量训练,提出了线性缩放weight decay的规则,并且提出了一个稳定的weight decay调度方法(SWD),在各种实验中,SWD方法往往比L2正则化和解耦的重量衰减有所改进。
Abstract
weight decay
is a popular
regularization
technique for training of deep
neural networks
. Modern deep learning libraries mainly use $L_{2}$
→