BriefGPT.xyz
Jun, 2018
AdaGrad步长:在非凸景观上的尖锐收敛
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
HTML
PDF
Rachel Ward, Xiaoxia Wu, Leon Bottou
TL;DR
本文提出了一种更新梯度下降步长的方法:AdaGrad-Norm,不需要微调步长计划,对于光滑的非凸函数具有收敛性,并具备健壮性
Abstract
Adaptive gradient methods such as
adagrad
and its variants update the stepsize in stochastic
gradient descent
on the fly according to the gradients received along the way; such methods have gained widespread use
→