Semi-supervised learning aims to take advantage of a large amount of unlabeled data to improve the accuracy of a model that only has access to a small number of labeled examples. We propose curriculum labeling, an approach that exploits pseudo-labeling for propagating labels to unlabeled samples in an iterative and self-paced fashion. This approach is surprisingly simple and effective and surpasses or is comparable with the best methods proposed in the recent literature across all the standard benchmarks for image classification. Notably, we obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled samples, and 88.56% top-5 accuracy on Imagenet-ILSVRC using 128,000 labeled samples. In contrast to prior works, our approach shows improvements even in a more realistic scenario that leverages out-of-distribution unlabeled data samples.

本文重新审视了伪标记的概念，提出了一种基于半监督学习的方法，通过将伪标记应用于无标签集中的样本，并利用已训练好的模型标记这些样本，然后迭代重复此过程来训练模型。本文通过实验证明，伪标记方法可以取得与现有最先进方法相媲美甚至更好的结果，并且更能抵御未知分布样本。作者指出采用学习课程原理以及在每个自我训练周期前重启模型参数是实现这一点的两个关键因素。在CIFAR-10数据集上，本文仅使用了4,000个标记样本，达到了94.91%的准确率，在Imagenet-ILSVRC数据集上，本文仅使用了10％的标记样本，达到了68.87％的top-1的准确率。

课程标记: 重新审视半监督学习中的伪标记