Mislabeled, duplicated, or biased data in real-world scenarios can lead to
prolonged training and even hinder model convergence. Traditional solutions
prioritizing easy or hard samples lack the flexibility to handle such a variety
simultaneously. Recent work has proposed a more reasonable data selection
principle by examining the data's impact on the model's generalization loss.
However, its practical adoption relies on less principled approximations and
additional clean holdout data. This work solves these problems by leveraging a
lightweight Bayesian treatment and incorporating off-the-shelf zero-shot
predictors built on large-scale pre-trained models. The resulting algorithm is
efficient and easy-to-implement. We perform extensive empirical studies on
challenging benchmarks with considerable data noise and imbalance in the online
batch selection scenario, and observe superior training efficiency over
competitive baselines. Notably, on the challenging WebVision benchmark, our
method can achieve similar predictive performance with significantly fewer
training iterations than leading data selection methods.

使用轻量级贝叶斯处理和基于大规模预训练模型的即用型零样本预测器，解决了现实场景中标记错误、重复或有偏差的数据在训练中的问题，提高了模型的训练效率。

通过贝叶斯数据选择实现模型训练加速

Towards Accelerated Model Training via Bayesian Data Selection

Training on web-scale data can take months. But most computation and time is
wasted on redundant and noisy points that are already learnt or not learnable.
To accelerate training, we introduce Reducible Holdout Loss Selection
(RHO-LOSS), a simple but principled technique which selects approximately those
points for training that most reduce the model's generalization loss. As a
result, RHO-LOSS mitigates the weaknesses of existing data selection methods:
techniques from the optimization literature typically select 'hard' (e.g. high
loss) points, but such points are often noisy (not learnable) or less
task-relevant. Conversely, curriculum learning prioritizes 'easy' points, but
such points need not be trained on once learned. In contrast, RHO-LOSS selects
points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains
in far fewer steps than prior art, improves accuracy, and speeds up training on
a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and
BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in
18x fewer steps and reaches 2% higher final accuracy than uniform data
shuffling.

使用可减少示例并且减少噪点的筛选技术进行训练能够减小无关点对模型学习的干扰。在类似 RHO-LOSS 这样可削减的示例中训练的时间比现有技术短得多，提高了准确性，并在广泛的数据集、超参数和架构中加快了训练

可学习、值得学习且尚未学习的点的优先训练

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

This paper explores the generalization loss of linear regression in variably
parameterized families of models, both under-parameterized and
over-parameterized. We show that the generalization curve can have an arbitrary
number of peaks, and moreover, locations of those peaks can be explicitly
controlled. Our results highlight the fact that both classical U-shaped
generalization curve and the recently observed double descent curve are not
intrinsic properties of the model family. Instead, their emergence is due to
the interaction between the properties of the data and the inductive biases of
learning algorithms.

该研究探讨了可变参数模型家族中线性回归的泛化损失，证明了一般化曲线可以有任意数量的峰值，并且这些峰值位置可以明确地受到控制。结果表明，经典的 U 形一般化曲线和最近观察到的双下降曲线不是模型家族的固有属性，而是由数据和学习算法的归纳偏差相互作用所导致的。