Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum $\ell_2$-norm interpolation in high-dimensional linear regression. Motivated by the connection with overparametrized neural networks, we consider the case of random features. We study two distinct models for the features' distribution: a linear model in which the feature vectors $x_i\in{\mathbb R}^p$ are obtained by applying a linear transform to vectors of i.i.d. entries, $x_i = \Sigma^{1/2}z_i$ (with $z_i\in{\mathbb R}^p$); a nonlinear model, in which the features are obtained by passing the input through a random one-layer neural network $x_i = \varphi(Wz_i)$ (with $z_i\in{\mathbb R}^d$, and $\varphi$ an activation function acting independently on the coordinates of $Wz_i$). We recover -- in a precise quantitative way -- several phenomena that have been observed in large scale neural networks and kernel machines, including the `double descent' behavior of the generalization error and the potential benefit of overparametrization.

本文研究了高维最小二乘回归中的最小L2范数（“无岭”）插值，并考虑了特征分布的两个不同模型：线性模型和非线性模型

高维无脊岭最小二乘插值中的惊喜