BriefGPT.xyz
Jul, 2024
自适应随机梯度下降优化方法(包括Adam)在非零学习率下的非收敛性
Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates
HTML
PDF
Steffen Dereich, Robin Graeber, Arnulf Jentzen
TL;DR
我们证明了自适应随机梯度下降方法(如Adam优化器)在学习率无限接近于零的情况下无法收敛到任何可能的随机极限点。
Abstract
deep learning algorithms
- typically consisting of a class of deep neural networks trained by a
stochastic gradient descent
(SGD) optimization method - are nowadays the key ingredients in many artificial intellig
→