In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.

研究浅层神经网络在过参数化情况下，如何使用二次激活函数进行训练并找到全局最优解，结果表明此方法适用于具有任意输入/输出对的任何训练数据，并可使用各种本地搜索启发式方法高效地找到全局最优解。同时，对於差分激活函数，我们也证明了梯度下降法在得到合适的初值后可以以线性速度收敛到全局最优解，它的输入来自符合高斯分布的选定属性且标记是通过种植的重量系数生成的。

过度参数的浅层神经网络优化空间的理论洞见