Mar, 2022
代理间隙最小化提高锐度感知训练
Surrogate Gap Minimization Improves Sharpness-Aware Training
TL;DRSurrogate Gap Guided Sharpness-Aware Minimization (GSAM) improves generalization by introducing a surrogate gap to measure low sharpness and defining a two-step optimization process involving gradient descent and an ascent step in the orthogonal direction to reach both low loss and low sharpness, achieving better generalization than Sharpness-Aware Minimization (SAM) and AdamW on ImageNet top-1 accuracy for ViT-B/32 with negligible computation overhead.