It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using this manifold structure and an appropriate metric, we propose a Hessian-based measure for flatness that is invariant to rescaling. We use this new measure to confirm the proposition that Large-Batch SGD minima are indeed sharper than Small-Batch SGD minima.

通过提出基于海森矩阵的浅度测量，在深度网络训练中检验了大批量SGD最小值确实比小批量SGD最小值更锐利，并且我们证明了正同态激活的深度网络的等价关系在参数空间中的商流形结构，并提出了一种具有等价不变性的测量平坦度的方法。

深度网络极小值的尺度不变平坦度量