We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.

通过平滑分析技术，我们对具有分段线性激活函数、二次损失和单输出的多层神经网络（MNN）在可微的局部极小值处的训练损失提供保证。特别地，我们证明对于一个具有一个隐藏层的MNN，几乎每个数据集和dropout-like噪声实现的每个可微局部极小值的训练误差都是零，然后将这些结果扩展到多个隐藏层的情况。我们的理论保证对训练数据几乎没有限制，并得到了数值验证。这些结果说明了为什么这些MNN的高度非凸损失可以通过局部更新（例如随机梯度下降）进行易于优化，这与经验证据相符。

多层神经网络训练无坏局部最小值: 针对数据的独立误差保证