The sharpness-aware minimization (SAM) algorithm and its variants, including gap guided SAM (GSAM), have been successful at improving the generalization capability of deep neural network models by finding flat local minima of the empirical loss in training. Meanwhile, it has been shown theoretically and practically that increasing the batch size or decaying the learning rate avoids sharp local minima of the empirical loss. In this paper, we consider the GSAM algorithm with increasing batch sizes or decaying learning rates, such as cosine annealing or linear learning rate, and theoretically show its convergence. Moreover, we numerically compare SAM (GSAM) with and without an increasing batch size and conclude that using an increasing batch size or decaying learning rate finds flatter local minima than using a constant batch size and learning rate.

本研究解决了深度神经网络模型在训练中寻找平坦局部最小值的能力不足的问题。通过理论分析，论文提出在使用逐步增大的批量大小或衰减学习率的情况下，锐度感知最小化算法（GSAM）能够更有效地收敛，并且数值比较表明，这种方法能比使用恒定批量大小和学习率找到更平坦的局部最小值。

使用逐步增大的批量大小和衰减学习率的锐度感知最小化算法的收敛性