Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method called \emph{optimum shifting}, which changes the parameters of a neural network from a sharp minimum to a flatter one while maintaining the same training loss value. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations, enabling adjustment of parameters in the solution space, which can be simply accomplished by solving a constrained optimization problem. Furthermore, we introduce a practical stochastic optimum shifting technique utilizing the Neural Collapse theory to reduce computational costs and provide more degrees of freedom for optimum shifting. Extensive experiments (including classification and detection) with various deep neural network architectures on benchmark datasets demonstrate the effectiveness of our method.

我们提出了一种名为“最优偏移”的新方法，通过改变神经网络的参数从一个尖锐的极小值到一个更平坦的极小值，同时保持相同的训练损失值，以此来提高神经网络的泛化能力。我们的方法基于以下观察：当固定神经网络的输入和输出时，网络内的矩阵乘法可以被看作是欠定线性方程组的解空间，通过解决一个有约束的优化问题可以简单地调整参数。此外，我们引入了一种利用神经坍缩理论减少计算成本并提供更多最优偏移自由度的实用随机最优偏移技术。通过在基准数据集上使用各种深度神经网络结构进行广泛的实验（包括分类和检测），验证了我们方法的有效性。

通过最优偏移改善深度神经网络的泛化能力