Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.

该论文提出了一种名为DISCO的新框架，可以使用大规模语言模型生成高质量的反事实数据，并借助特定于任务的老师模型过滤生成，以提高模型的稳健性和泛化性能。实验结果表明，使用这种方式进行学习，学生模型的鲁棒性和跨分布能力比基线提高了6%（绝对）和5%。

DISCO: 利用大型语言模型提取短语反事实