In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.

我们研究了使用经验平方误差训练的一层隐藏层神经网络的损失景观。我们提出适用于非可微和可微情况的站点条件，并显示如果静止点不包含“逃逸神经元”，则它必须是局部最小值。此外，我们的研究还能够全面讨论网络嵌入如何重塑静止点。

浅层ReLU-like神经网络的损失景观：静态点、鞍点逃逸和网络嵌入