We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory. By decomposing of the total generalization error attributed to different eigenspace of the kernel in function space, we propose a criterion for balancing the errors stemming from training set and the distribution discrepancy between the training set and the true data. Through this approach, we establish that modifying the training direction of the neural network in function space leads to a reduction in the total generalization error. Furthermore, We demonstrate that this theoretical framework is capable to explain many existing results of generalization enhancing methods. These theoretical results are also illustrated by numerical examples on synthetic data.

我们在神经网络函数空间中基于神经切向核和Fisher信息矩阵的特征分解，对一种改进的自然梯度下降方法进行理论分析。我们首先在假设高斯分布和无穷宽度限制下给出了该改进自然梯度学习到的函数的解析表达式。接着，我们利用特征分解和统计理论的理论方法，明确地推导了学习到的神经网络函数的泛化误差。通过将泛化误差分解为函数空间中不同特征空间的总和，我们提出了一个平衡训练集和真实数据之间分布差异引起的误差的准则。通过这种方法，我们证明了在函数空间中修改神经网络的训练方向会降低总的泛化误差。此外，我们证明了这个理论框架能够解释许多现有的泛化增强方法的结果。我们还通过合成数据的数值例子对这些理论结果进行了说明。

修改函数空间中的训练方向以减少泛化误差