Jun, 2022

探索锐度感知最小化理解

TL;DRSharpness-Aware Minimization (SAM) relies on worst-case weight perturbations to improve generalization; we provide a more complete theoretical framework for SAM's success, analyze its implicit bias on diagonal linear networks and empirically on fine-tuning non-linear networks, and provide convergence results for non-convex objectives when used with stochastic gradients.