Neural networks have great success in many machine learning applications, but the fundamental learning theory behind them remains largely unsolved. Learning neural networks is NP-hard, but in practice, simple algorithms like stochastic gradient descent (SGD) often produce good solutions. Moreover, it is observed that overparameterization --- designing networks whose number of parameters is larger than statistically needed to perfectly fit the data --- improves both optimization and generalization, appearing to contradict traditional learning theory. In this work, we extend the theoretical understanding of two and three-layer neural networks in the overparameterized regime. We prove that, using overparameterized neural networks, one can (improperly) learn some notable hypothesis classes, including two and three-layer neural networks with fewer parameters. Moreover, the learning process can be simply done by SGD or its variants in polynomial time using polynomially many samples. We also show that for a fixed sample size, the generalization error of the solution found by some SGD variant can be made almost independent of the number of parameters in the overparameterized network.

本文通过分析神经网络在超参数化情况下的学习理论，证明了神经网络能够通过SGD算法简单地学习某些重要的概念并且样本复杂度几乎独立于网络参数的数量。此外，本文还建立了一个神经网络的二次近似概念，并将其与如何逃离鞍点的SGD理论联系起来。

超参数神经网络中的学习和泛化：超越两层