Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and prevents the network from overfitting. In this work, we introduce the gradient gap deviation and the gradient deflection as statistical measures corresponding to the network curvature and the Hessian matrix to analyze variations of network derivatives with respect to input parameters, and investigate how implicit regularization works in ReLU neural networks from both theoretical and empirical perspectives. Our result reveals that the network output between each pair of input samples is properly controlled by random initialization and stochastic gradient descent to keep interpolating between samples almost straight, which results in low complexity of over-parameterized neural networks.

本文通过引入梯度间隙偏差和梯度偏转等统计量，从理论和实证角度研究了内隐正则化在ReLU神经网络中的运作方式，结果表明通过随机初始化和随机梯度下降的方式有效地控制网络输出，使其在样本之间直线插值且负责度较低。

超参数化神经网络中的隐式正则化