Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.

研究了机器学习模型泛化的定量现象，发现基于核回归的大规模神经网络和真实数据的核函数理论分析往往无法捕捉到这些现象，然而基于GCV估算器的实证研究结果表明该方法可以在这种超参数化的情境下准确预测泛化风险，并证明了GCV估算器在满足局部随机矩阵定理时总是可以收敛到泛化风险，最后应用这个随机矩阵理论解释了为什么预训练表示的泛化性更好以及什么因素支配了核回归的放缩定律，该研究揭示了随机矩阵理论对于理解神经表征的性质至关重要。

不仅仅是玩具：随机矩阵模型预测现实世界神经表示如何泛化