Within the context of reading comprehension, the task of Distractor
Generation (DG) aims to generate several incorrect options to confuse readers.
Traditional supervised methods for DG rely heavily on expensive human-annotated
distractor labels. In this paper, we propose an unsupervised DG framework,
leveraging Large Language Models (LLMs) as cost-effective annotators to enhance
the DG capability of smaller student models. Specially, to perform knowledge
distilling, we propose a dual task training strategy that integrates pseudo
distractors from LLMs and the original answer in-formation as the objective
targets with a two-stage training process. Moreover, we devise a counterfactual
contrastive decoding mechanism for increasing the distracting capability of the
DG model. Experiments show that our unsupervised generation method with
Bart-base greatly surpasses GPT-3.5-turbo performance with only 200 times fewer
model parameters. Our proposed unsupervised DG method offers a cost-effective
framework for practical reading comprehension applications, without the need of
laborious distractor annotation and costly large-size models

在阅读理解的背景下，我们提出了一种无监督的分心生成框架，利用大型语言模型作为经济高效的注释器来增强较小学生模型的分心生成能力。实验证明，我们提出的无监督生成方法大大超越了 GPT-3.5-turbo 的性能，而参数只有后者的 200 倍少。我们的无监督分心生成方法为实际阅读理解应用提供了一种经济高效的框架，无需费力的分心注释和昂贵的大型模型。

通过大型语言模型蒸馏和对抗对比解码进行无监督分心生成

Unsupervised Distractor Generation via Large Language Model Distilling  and Counterfactual Contrastive Decoding

Modern machine learning suffers from catastrophic forgetting when learning
new classes incrementally. The performance dramatically degrades due to the
missing data of old classes. Incremental learning methods have been proposed to
retain the knowledge acquired from the old classes, by using knowledge
distilling and keeping a few exemplars from the old classes. However, these
methods struggle to scale up to a large number of classes. We believe this is
because of the combination of two factors: (a) the data imbalance between the
old and new classes, and (b) the increasing number of visually similar classes.
Distinguishing between an increasing number of visually similar classes is
particularly challenging, when the training data is unbalanced. We propose a
simple and effective method to address this data imbalance issue. We found that
the last fully connected layer has a strong bias towards the new classes, and
this bias can be corrected by a linear model. With two bias parameters, our
method performs remarkably well on two large datasets: ImageNet (1000 classes)
and MS-Celeb-1M (10000 classes), outperforming the state-of-the-art algorithms
by 11.1% and 13.2% respectively.

本文提出并验证了一种纠正新旧类别数据不平衡问题的方法，利用一个线性模型矫正了全连接层的弱分类偏见，在两个大型数据集 ImageNet 和 MS-Celeb-1M 上比现有算法表现更好。