This paper investigates how non-differentiability affects three different aspects of the neural network training process. We first analyze fully connected neural networks with ReLU activations, for which we show that the continuously differentiable neural networks converge faster than non-differentiable neural networks. Next, we analyze the problem of $L_{1}$ regularization and show that the solutions produced by deep learning solvers are incorrect and counter-intuitive even for the $L_{1}$ penalized linear model. Finally, we analyze the Edge of Stability problem, where we show that all convex, non-smooth, Lipschitz continuous functions display unstable convergence, and provide an example of a result derived using twice differentiable functions which fails in the once differentiable setting. More generally, our results suggest that accounting for the non-linearity of neural networks in the training process is essential for us to develop better algorithms, and to get a better understanding of the training process in general.

非可微性对神经网络训练过程的三个方面产生了影响。我们首先分析具有ReLU激活函数的全连接神经网络，结果显示连续可微的神经网络收敛速度更快。接下来，我们分析$L_{1}$正则化问题，并展示深度学习求解器产生的解即使对于$L_{1}$惩罚的线性模型也是错误和反直觉的。最后，我们分析稳定性边界问题，我们证明所有的凸性非光滑的Lipschitz连续函数都显示不稳定的收敛，并且给出了一个在两次可微函数失败的一次可微设置的例子。总的来说，我们的研究结果表明在训练过程中考虑神经网络的非线性是我们开发更好算法和更好理解训练过程的关键。

非可微对神经网络训练的三种影响