Neural networks are a powerful class of functions that can be trained with simple gradient descent to achieve state-of-the-art performance on a variety of applications. Despite their practical success, there is a paucity of results that provide theoretical guarantees on why they are so effective. Lying in the center of the problem is the difficulty of analyzing the non-convex objective function with potentially numerous local minima and saddle points. Can neural networks corresponding to the stationary points of the objective function learn the true labeling function? If yes, what are the key factors contributing to such generalization ability? In this paper, we provide answers to these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units can learn the true function. We bypass the non-convexity issue by directly analyzing the first order condition, and show that the loss is bounded if the smallest singular value of the "extended feature matrix" is large enough. We make novel use of techniques from kernel methods and geometric discrepancy, and identify a new relation linking the smallest singular value to the spectrum of a kernel function associated with the activation function and to the diversity of the units. Our results also suggest a novel regularization function to promote unit diversity for potentially better generalization ability.

通过对具有ReLU激活函数的一层神经网络的分析，我们发现神经网络具有良好的优化特性，其具有多样的单元没有虚假局部最小值，在满足“扩展特征矩阵”的最小奇异值足够大的条件下，可以使损失函数变得任意小。

多元神经网络学习真实目标函数