动量梯度下降中的大型弹射器研究

Nov, 2023

动量梯度下降中的大型弹射器研究

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun

TL;DR通过实证研究，我们发现使用较大学习速率和学习速率预热的动量梯度下降会产生大的弹射效应，将迭代点推向更平坦的最小值，我们提供了实证证据和理论解释表明这种弹射效应是由于动量“放大”了自稳定效应。

Abstract

Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive. In this work, we empirically show that mom