It has repeatedly been observed that loss minimization by stochastic gradient descent (SGD) leads to heavy-tailed distributions of neural network parameters. Here, we analyze a continuous diffusion approximation of SGD, called homogenized stochastic gradient descent, show that it behaves asymptotically heavy-tailed, and give explicit upper and lower bounds on its tail-index. We validate these bounds in numerical experiments and show that they are typically close approximations to the empirical tail-index of SGD iterates. In addition, their explicit form enables us to quantify the interplay between optimization parameters and the tail-index. Doing so, we contribute to the ongoing discussion on links between heavy tails and the generalization performance of neural networks as well as the ability of SGD to avoid suboptimal local minima.

通过对连续扩散逼近的随机梯度下降进行分析，我们发现它在渐近意义下表现出重尾分布，并给出了尾指数的上下界。我们通过数值实验验证了这些界限，并显示它们通常是SGD迭代的经验尾指数的近似。此外，这些界限的显式形式使我们能够量化优化参数与尾指数之间的相互作用，这对于研究神经网络的广义性能和SGD避免次优局部极小值的能力的关联问题具有重要意义。

均值随机梯度下降中的重尾出现