In the domain of semi-supervised learning (SSL), the conventional approach involves training a learner with a limited amount of labeled data alongside a substantial volume of unlabeled data, both drawn from the same underlying distribution. However, for deep learning models, this standard practice may not yield optimal results. In this research, we propose an alternative perspective, suggesting that distributions that are more readily separable could offer superior benefits to the learner as compared to the original distribution. To achieve this, we present PruneSSL, a practical technique for selectively removing examples from the original unlabeled dataset to enhance its separability. We present an empirical study, showing that although PruneSSL reduces the quantity of available training data for the learner, it significantly improves the performance of various competitive SSL algorithms, thereby achieving state-of-the-art results across several image classification tasks.

该研究提出了PruneSSL，一种可用于增强原始未标记数据集可分离度的实用技术，通过实证研究表明，尽管PruneSSL减少了学习器的可用训练数据量，但它显著提高了多种竞争的半监督学习算法的性能，从而在多个图像分类任务中达到了最先进的结果。

剪枝无标签数据以提升半监督学习