BriefGPT.xyz
May, 2018
自适应步长随机梯度下降算法的收敛性
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
HTML
PDF
Xiaoyu Li, Francesco Orabona
TL;DR
通过研究广义AdaGrad步长在凸和非凸设置中,本文证明了这些步长实现梯度渐近收敛于零的充分条件,从而填补了这些方法理论上的空白。此外,本文表明这些步长允许自动适应随机梯度噪声级别在凸和非凸情况下,实现O(1/T)到O(1/根号T)的插值(带有对数项)。
Abstract
stochastic gradient descent
is the method of choice for large scale optimization of
machine learning
objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsize
→