BriefGPT.xyz
Nov, 2023
动量梯度下降中的大型弹射器研究
Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study
HTML
PDF
Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun
TL;DR
通过实证研究,我们发现使用较大学习速率和学习速率预热的动量梯度下降会产生大的弹射效应,将迭代点推向更平坦的最小值,我们提供了实证证据和理论解释表明这种弹射效应是由于动量“放大”了自稳定效应。
Abstract
Although
gradient descent with momentum
is widely used in modern deep learning, a concrete understanding of its effects on the
training trajectory
still remains elusive. In this work, we empirically show that mom
→