Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L_2$ risk are universally consistent and achieve good rates of convergence. In this paper, we show that the regularization term is not necessary to obtain similar results. In the case of a suitably chosen initialization of the network, a suitable number of gradient descent steps, and a suitable step size we show that an estimate without a regularization term is universally consistent for bounded predictor variables. Additionally, we show that if the regression function is H\"older smooth with H\"older exponent $1/2 \leq p \leq 1$, the $L_2$ error converges to zero with a convergence rate of approximately $n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the regression function consists of a sum of H\"older smooth functions with $d^*$ components, a rate of convergence is derived which does not depend on the input dimension $d$.

通过合适的初始化、梯度下降步数和步长选择，在深度神经网络中无需正则化项，可以达到普适的一致性和收敛速度，而且对于有界预测变量，$L_2$误差收敛速度约为$n^{-1/(1+d)}$，对于交互模型，收敛速度与输入维度$d$无关。

无正则项梯度下降学得的过参数化深度神经网络估计的$L_2$误差分析