Optimization algorithms is crucial in training physics-informed neural networks (PINNs), unsuitable methods may lead to poor solutions. Compared to the common gradient descent algorithm, implicit gradient descent (IGD) outperforms it in handling some multi-scale problems. In this paper, we provide convergence analysis for the implicit gradient descent for training over-parametrized two-layer PINNs. We first demonstrate the positive definiteness of Gram matrices for general smooth activation functions, like sigmoidal function, softplus function, tanh function and so on. Then the over-parameterization allows us to show that the randomly initialized IGD converges a globally optimal solution at a linear convergence rate. Moreover, due to the different training dynamics, the learning rate of IGD can be chosen independent of the sample size and the least eigenvalue of the Gram matrix.

本文提供了用于训练过参数化的两层物理信息神经网络的隐性梯度下降法的收敛性分析，证明了常见的平滑激活函数（如sigmoid函数、softplus函数、tanh函数等）的Gram矩阵是正定的。通过过参数化，随机初始化的隐性梯度下降法以线性收敛速率收敛于全局最优解，并且由于不同的训练动态，可以独立于样本大小和Gram矩阵的最小特征值选择学习率。

训练两层物理引导神经网络的隐式梯度下降收敛性