BriefGPT.xyz
Feb, 2022
LST: 基于词典引导的自训练在小样本文本分类中的应用
LST: Lexicon-Guided Self-Training for Few-Shot Text Classification
HTML
PDF
Hazel Kim, Jaeman Son, Yo-Sub Han
TL;DR
本文介绍了一种使用词汇表来指导伪标记机制的简单的自训练方法,即LST。通过使用语言丰富的方式,我们不断优化词汇表来预测未见数据的置信度,从而更好地教授伪标签,实现了5个基准数据集每个类别30个标注样本的1.0-2.0%的性能提高。
Abstract
self-training
provides an effective means of using an extremely small amount of labeled data to create pseudo-labels for unlabeled data. Many state-of-the-art
self-training
approaches hinge on different
→