Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.

本文从不同的角度研究PLM的few-shot学习，通过调参后作为生成器合成了大量的新训练样本，使用加权最大似然度量进行训练以鼓励生成器生成分类标签的样本，并使用正则化fine-tuned在小样本和合成样本上取得了比现有few-shot学习方法更好的结果，GLUE基准测试中超过无增加方法5+平均点数和增加方法3+平均点数。

调整语言模型作为训练数据生成器，用于增强少样本学习