弱监督训练子集选择

Jun, 2022

Training Subset Selection for Weak Supervision

Hunter Lang, Aravindan Vijayaraghavan, David Sontag

TL;DR本篇论文研究了弱监督机器学习方法，提出了利用预训练数据表示结合剪枝统计学方法选择高质量弱标签数据的子集，优化了弱监督模型的表现，提升了19%的准确率。

Abstract

Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of weakly-labeled data and the precision of the weak labe