BriefGPT.xyz
May, 2025
有限宽度多层神经网络的精确梯度下降训练动态
Precise gradient descent training dynamics for finite-width multi-layer neural networks
HTML
PDF
Qiyang Han, Masaaki Imaizumi
TL;DR
本文首次精确描述了一般多层神经网络的梯度下降迭代分布,解决了在有限宽度比例范围内样本量与特征维度成比例增长的问题。提出的非渐近状态演化理论揭示了第一层权重的高斯波动和深层权重的集中特性,最显著的发现是该理论能够为每次迭代提供一致的泛化误差估计,从而指导早停和超参数调优。
Abstract
In this paper, we provide the first precise distributional characterization of
Gradient Descent
iterates for general
Multi-layer Neural Networks
under the canonical single-index regression model, in the `finite-w
→