Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is \emph{impossible} to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, \emph{except} for the "balancedness" property identified in Du et al. [2018]. Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.

针对非线性神经网络的回归损失（如平方损失），研究其隐含正则化（即隐含偏差）的特性，虽然已成为一个活跃的研究领域，但迄今为止仍未得到深入探究。本文通过一些证明，表明甚至对于单个ReLU神经元，无法利用原模型参数的任何显式函数来表征隐含正则化特性（尽管我们可以近似表征），对于一层隐藏层的网络而言也存在类似的现象。本研究建议采用比目前更加广泛的框架来理解非线性预测的隐性正则化，并提供了一些线索。

ReLU网络在平方损失下的隐式正则化