Mar, 2022


TL;DRSurrogate Gap Guided Sharpness-Aware Minimization (GSAM) improves generalization by introducing a surrogate gap to measure low sharpness and defining a two-step optimization process involving gradient descent and an ascent step in the orthogonal direction to reach both low loss and low sharpness, achieving better generalization than Sharpness-Aware Minimization (SAM) and AdamW on ImageNet top-1 accuracy for ViT-B/32 with negligible computation overhead.