BriefGPT.xyz
Feb, 2021
神经缩放定律解释
Explaining Neural Scaling Laws
HTML
PDF
Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma
TL;DR
该研究提出了一种理论,解释并连接训练数据集大小和网络参数数量与已训练神经网络的测试损失之间的精确定义的幂律关系,并通过说明数据流形和一些核的频谱之间的等效性来解释了分辨率有限的缩放行为。
Abstract
The test loss of well-trained
neural networks
often follows precise
power-law scaling
relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that ex
→