We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.

本研究探讨了当经验风险为弱凸函数时，梯度下降的学习性能，并通过将最小负特征值应用于控制梯度下降的稳定性，从而证明了与先前的研究相比，其持有更广范围步长的一般化误差界。当经验风险满足局部弱凸性时，可以通过对网络进行归一化来控制误差，其中，两层神经网络的经验风险可以满足局部弱凸性。通过权衡网络复杂度和缩放，深入探讨了神经网络缩放的隐式偏差，并得出实验结果的支持。

使用梯度下降和弱凸损失进行学习