Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (comprising less than 0.1% of the total parameters) in the adversarial step of SAM outperforms perturbing all of the parameters. This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness. The code for our experiments is publicly available at https://github.com/mueller-mp/SAM-ON.

本文研究了对小部分参数进行扰动的Sharpness-aware minimization (SAM)的性能，并通过实验结果证明，只操作正规化处理中几乎不占比例的仿射变换参数能比全局扰动得到更好的效果。

规范层是Sharpness-Aware最小化的全部需求