BriefGPT.xyz
Apr, 2019
使SGD的最后迭代在信息理论上达到最优
Making the Last Iterate of SGD Information Theoretically Optimal
HTML
PDF
Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli
TL;DR
本文旨在设计新的步长序列,以获得对最后一点的 SGD 和 GD 的理论最佳子优越性保证,并通过模拟验证了新的步长序列相对于标准步长序列的改进,主要涉及随机梯度下降、优化、步长序列、子优越性和收敛率。
Abstract
stochastic gradient descent
(SGD) is one of the most widely used algorithms for large scale
optimization
problems. While classical theoretical analysis of SGD for convex problems studies (suffix) \emph{averages}
→