This paper addresses the limitations of previous training methods that emphasize either easy examples like self-paced learning or difficult examples like hard example mining. Inspired by active learning, we propose two alternatives to re-weight training samples based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold (or threshold closeness). Extensive experimental results on multiple datasets show that our methods reliably improve accuracy in various network architectures, including providing additional gains on top of other popular training tools, such as ADAM, dropout, and distillation.

本文提出基于样本不确定性轻量级估计的两种改进型随机梯度下降算法：SGD 迭代中正确类别预测概率的方差和与决策阈值的正确类别概率的接近度来重新加权训练样本，实验结果表明我们的方法可靠地提高了各种网络结构的精度，包括残差学习、动量、ADAM、批量标准化、丢弃和蒸馏等其他流行的训练技术所不能达到的额外收益。

主动偏置：通过强调高方差样本训练更准确的神经网络