Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data. When the training dataset contains a fraction of noisy labels, can neural networks be resistant to over-fitting and still generalize on the true distribution? Inspired by recent theoretical work that established connections between over-parameterized neural networks and neural tangent kernel (NTK), we propose two simple regularization methods for this purpose: (i) regularization by the distance between the network parameters to initialization, and (ii) adding a trainable auxiliary variable to the network output for each training example. Theoretically, both methods are related to kernel ridge regression with respect to the NTK, and we prove their generalization guarantee on the true data distribution despite being trained using noisy labels. The generalization bound is independent of the network size, and only depends on the training inputs and true labels (instead of noisy labels) as well as the noise level in the labels. Empirical results verify the effectiveness of these methods on noisily labeled datasets.

探讨在有噪声标签的情况下，过度参数化的深度神经网络的正则化方法，其中比较有效的包括参数与初始化之间的距离和为每个训练示例添加一个可训练的辅助变量，实验结果表明这些方法能够有效提高模型的泛化性，并且泛化误差的上界独立于网络的大小，可达到无噪声标签情况下的水平。

一种简单有效的正则化方法，用于携带有泛化保证的嘈杂标签数据的训练