We analyze the generalization properties of two-layer neural networks in the neural tangent kernel (NTK) regime, trained with gradient descent (GD). For early stopped GD we derive fast rates of convergence that are known to be minimax optimal in the framework of non-parametric regression in reproducing kernel Hilbert spaces. On our way, we precisely keep track of the number of hidden neurons required for generalization and improve over existing results. We further show that the weights during training remain in a vicinity around initialization, the radius being dependent on structural assumptions such as degree of smoothness of the regression function and eigenvalue decay of the integral operator associated to the NTK.

我们在神经切向核（NTK）范围内对使用梯度下降（GD）训练的两层神经网络的泛化性质进行分析，对于早停止的GD，我们得到了在再现核希尔伯特空间的非参数回归框架中已知为最小化最优的快速收敛速度；在此过程中，我们准确地跟踪了泛化所需的隐藏神经元的数量，并改进了现有的结果；此外，我们进一步展示了在训练过程中，权重保持在初始化附近的一个领域内，该半径取决于回归函数的平滑度和与NTK相关的积分算子的特征值衰减等结构假设。

我们需要多少个神经元？使用梯度下降算法训练的浅层网络的精细分析