BriefGPT.xyz
Jun, 2020
结构化非凸函数的 SGD:学习率、小批量和插值
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
HTML
PDF
Robert M. Gower, Othmane Sebbouh, Nicolas Loizou
TL;DR
本文研究了随机梯度下降(SGD)在优化非凸函数方面的应用,提出了一些收敛理论,说明了在满足结构性假设的非凸问题中,SGD能够收敛到全局最小值,分析过程基于一个期望残差条件,相比之前的假设更加宽松。
Abstract
We provide several
convergence theorems
for SGD for two large classes of structured
non-convex functions
: (i) the Quasar (Strongly) Convex functions and (ii) the functions satisfying the
→