BriefGPT.xyz
Feb, 2020
非凸世界中SGD的更好理论
Better Theory for SGD in the Nonconvex World
HTML
PDF
Ahmed Khaled, Peter Richtárik
TL;DR
本篇论文使用类似于期望光滑性假设的新方法来研究随机梯度下降法在非凸优化中的收敛率,并在考虑多种采样策略和小批量大小的情况下,探讨有限和优化问题的影响。
Abstract
Large-scale
nonconvex optimization
problems are ubiquitous in modern machine learning, and among practitioners interested in solving them,
stochastic gradient descent
(SGD) reigns supreme. We revisit the analysis
→