Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art. This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

本文采用实证方法针对机器翻译、语言建模、图像处理和语音识别等4个机器学习领域的数据，研究训练集大小、模型规模与推广误差之间的关系，结果表明推广误差遵循幂定律缩放，且模型改进只改变误差而不影响幂指数。此外，模型大小随数据规模的增大缩小，这些研究对于深度学习的研究、实践和系统设计具有重要意义。

深度学习的扩展是可预测的，实证的