BriefGPT.xyz
Jun, 2023
SGD的加速动量:何时及为何加速?——一个实证研究
When and Why Momentum Accelerates SGD:An Empirical Study
HTML
PDF
Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen...
TL;DR
通过对比带动量的随机梯度下降(SGDM)和不带动量的随机梯度下降(SGD)的表现,发现动量加速与突然的尖峰有关,而动量的作用是防止或推迟尖峰的发生,并揭示了动量、学习率和批次大小之间的相互作用,可以加速SGDM的性能。
Abstract
momentum
has become a crucial component in
deep learning
optimizers, necessitating a comprehensive understanding of when and why it accelerates
s
→