This paper presents a comprehensive study on the convergence rates of the stochastic gradient descent (SGD) algorithm when applied to overparameterized two-layer neural networks. Our approach combines the Neural Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Hilbert Space (RKHS) generated by NTK, aiming to provide a deep understanding of the convergence behavior of SGD in overparameterized two-layer neural networks. Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the optimization dynamics and convergence properties of neural networks. In this study, we establish sharp convergence rates for the last iterate of the SGD algorithm in overparameterized two-layer neural networks. Additionally, we have made significant advancements in relaxing the constraints on the number of neurons, which have been reduced from exponential dependence to polynomial dependence on the sample size or number of iterations. This improvement allows for more flexibility in the design and scaling of neural networks, and will deepen our theoretical understanding of neural network models trained with SGD.

通过在生成由NTK引起的再生核希尔伯特空间（RKHS）中结合降噪核近似和收敛性分析的方法，本研究对将随机梯度下降（SGD）算法应用于过参数化的两层神经网络的收敛速度进行了全面研究，以提供对SGD在过参数化的两层神经网络中收敛行为的深入理解，探索了核方法和优化过程之间复杂的相互作用，为神经网络的优化动力学和收敛性质提供了启示。研究还在对神经元数量的约束上取得了重要进展，将其从指数关系减少到多项式关系，这一改进使神经网络的设计和扩展更加灵活，并将加深我们对用SGD训练的神经网络模型的理论理解。

双层神经网络的随机梯度下降