The implicit bias induced by the training of neural networks has become a
topic of rigorous study. In the limit of gradient flow and gradient descent
with appropriate step size, it has been shown that when one trains a deep
linear network with logistic or exponential loss on linearly separable data,
the weights converge to rank-1 matrices. In this paper, we extend this
theoretical result to the last few linear layers of the much wider class of
nonlinear ReLU-activated feedforward networks containing fully-connected layers
and skip connections. Similar to the linear case, the proof relies on specific
local training invariances, sometimes referred to as alignment, which we show
to hold for submatrices where neurons are stably-activated in all training
examples, and it reflects empirical results in the literature. We also show
this is not true in general for the full matrix of ReLU fully-connected layers.
Our proof relies on a specific decomposition of the network into a multilinear
function and another ReLU network whose weights are constant under a certain
parameter directional convergence.

本论文研究神经网络训练中的隐性偏差，探究梯度流和梯度下降的极限情况下，使用对数或指数损失函数对线性可分数据进行训练的深度线性网络的权重收敛于秩 1 矩阵的现象是否会发生于全连接层和跳跃连接层的 ReLU 激活前馈网络中，提出了一些训练不变性，并以特定参数方向收敛的 ReLU 网络的常数权重和多线性函数作为论据进行证明。