BriefGPT.xyz
Nov, 2018
超参数问题中最小权重范式模型不总是具有良好的泛化能力
Minimum norm solutions do not always generalize well for over-parameterized problems
HTML
PDF
Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi
TL;DR
通过实证发现,自适应方法在深度神经网络的训练中相比随机梯度下降可以有更好的泛化能力,需要较少的调整,同时不一定得到更小的权重范数。
Abstract
stochastic gradient descent
is the de facto algorithm for training
deep neural networks
(DNNs). Despite its popularity, it still requires fine hyper-parameter tuning in order to achieve its best performance. This
→