We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-{\L}ojasiewicz (PL) and the quadratic growth (QG) conditions. We further show that these conditions arise for some neural networks with linear activations. We use our black-box results to establish the stability of optimization algorithms such as stochastic gradient descent (SGD), gradient descent (GD), randomized coordinate descent (RCD), and the stochastic variance reduced gradient method (SVRG), in both the PL and the strongly convex setting. Our results match or improve state-of-the-art generalization bounds and can easily be extended to similar optimization algorithms. Finally, we show that although our results imply comparable stability for SGD and GD in the PL setting, there exist simple neural networks with multiple local minima where SGD is stable but GD is not.

本文通过建立黑盒稳定性结果，仅依赖于学习算法的收敛和损失函数最小值周围的几何形态，为收敛到全局最小值的学习算法建立新的泛化界限，适用于满足Polyak-Lojasiewicz（PL）和二次增长（QG）条件的非凸损失函数以及一些具有线性激活的神经网络，并使用黑盒结果来证明SGD、GD、RCD和SVRG等优化算法的稳定性在PL和强凸设置中具有可拓展性，同时指出存在简单的具有多个局部最小值的神经网络，在PL设置下SGD稳定，但GD不稳定。

收敛于全局最优解的学习算法的稳定性和泛化性