We extend the global convergence result of Chatterjee \cite{chatterjee2022convergence} by considering the stochastic gradient descent (SGD) for non-convex objective functions. With minimal additional assumptions that can be realized by finitely wide neural networks, we prove that if we initialize inside a local region where the \L{}ajasiewicz condition holds, with a positive probability, the stochastic gradient iterates converge to a global minimum inside this region. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. For that, we assume the SGD noise scales with the objective function, which is called machine learning noise and achievable in many real examples. Furthermore, we provide a negative argument to show why using the boundedness of noise with Robbins-Monro type step sizes is not enough to keep the key component valid.

在考虑非凸目标函数的随机梯度下降的情况下，我们扩展了Chatterjee（2022）的全局收敛结果。我们证明，如果我们初始化到一个局部区域，其中Lajasiewicz条件成立，那么在该局部区域内，具有正概率的随机梯度迭代会收敛到全局最小值，并且我们的证明的关键组成部分是确保SGD的整个轨迹以正概率留在局部区域内。为此，我们假设SGD噪声与目标函数成比例，称为机器学习噪声，并可在许多实际示例中实现。此外，我们提供了一个负面的论据，以表明使用类似于Robbins-Monro类型步长的有界噪声是不足以保持主要组成部分有效的。

深度神经网络在局部Lajasiewicz条件下随机梯度下降的收敛