Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.

通过使用高质量的数据剪枝度量来破解神经网络权重学习数据集大小的幂律估计，可以实现更好的误差缩放。本文进行了实证研究，并测试了十种不同的数据剪枝度量，发现现有的大多数度量在ImageNet上表现不佳。作者因此开发了一种新的自监督剪枝度量，具有与最佳监督度量相当的性能。研究表明，发现好的数据剪枝度量可能是深度学习资源消耗大幅降低的一个可行途径。

超越神经网络规模定律: 通过数据修剪击败幂律缩放