BriefGPT.xyz
Oct, 2022
从群体损失的梯度流到随机梯度下降学习
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
HTML
PDF
Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan
TL;DR
本文通过分析Gradient Flow在目标函数收敛时的性质,提供了SGD收敛的一般条件,研究了Lyapunov potentials与目标函数几何性质的关联,并给出了SGD收敛的保证,适用于一些复杂问题。
Abstract
stochastic gradient descent
(SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the
→