BriefGPT.xyz
Feb, 2024
使用常数和衰减的学习率的随机梯度下降的迭代和随机一阶预言机复杂度
Iteration and Stochastic First-order Oracle Complexities of Stochastic Gradient Descent using Constant and Decaying Learning Rates
HTML
PDF
Kento Imaizumi, Hideaki Iiduka
TL;DR
使用常数或递减的学习率的随机梯度下降法(SGD)与关键的批次大小能够最小化深度学习中的非凸优化的随机一阶复杂性,并且与现有的一阶优化器相比较具有实用性。
Abstract
The performance of
stochastic gradient descent
(SGD), which is the simplest first-order optimizer for training deep neural networks, depends on not only the
learning rate
but also the
→