We present a new approach to understanding the relationship between loss curvature and generalisation in deep learning. Specifically, we use existing empirical analyses of the spectrum of deep network loss Hessians to ground an ansatz tying together the loss Hessian and the input-output Jacobian of a deep neural network. We then prove a series of theoretical results which quantify the degree to which the input-output Jacobian of a model approximates its Lipschitz norm over a data distribution, and deduce a novel generalisation bound in terms of the empirical Jacobian. We use our ansatz, together with our theoretical results, to give a new account of the recently observed progressive sharpening phenomenon, as well as the generalisation properties of flat minima. Experimental evidence is provided to validate our claims.

我们提出了一种新的方法来理解深度学习中损失曲率和泛化之间的关系，特别地，我们使用深度网络损失Hessian频谱的现有经验分析来基于一个猜想将深度神经网络的损失Hessian和输入输出Jacobian联系在一起。我们证明了一系列理论结果，这些结果量化了模型的输入输出Jacobian在数据分布上近似其Lipschitz范数的程度，并在经验Jacobian的术语中推导出一个新的泛化界限。我们使用我们的猜想以及我们的理论结果来提供一个关于最近观察到的渐进锐化现象以及平坦极小值的泛化特性的新解释。我们提供了实验证据来验证我们的论点。

渐进锐化、平坦最小值和泛化