This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function.

该论文研究了使用小的初始值训练深层均匀神经网络时产生的梯度流动力学。该研究表明，在训练的早期阶段，神经网络的权重保持较小的范数，并且在神经关联函数的Karush-Kuhn-Tucker (KKT)点附近大致收敛于相同方向。此外，在平方损失和神经网络权重的可分离性假设下，梯度流动力学在损失函数的某些鞍点附近也显示出类似的方向收敛。

深度均质神经网络的早期方向收敛及小初始化