Bias-measuring datasets play a critical role in detecting biased behavior of language models and in evaluating progress of bias mitigation methods. In this work, we focus on evaluating gender bias through coreference resolution, where previous datasets are either hand-crafted or fail to reliably measure an explicitly defined bias. To overcome these shortcomings, we propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation, and construct Counter-GAP, an annotated dataset consisting of 4008 instances grouped into 1002 quadruples. We further identify a bias cancellation problem in previous group-level metrics on Counter-GAP, and propose to use the difference between inconsistency across genders and within genders to measure bias at a quadruple level. Our results show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group, and that a name-based counterfactual data augmentation method is more effective to mitigate such bias than an anonymization-based method.

本文提出了一种新的方法通过反事实生成来收集多样性，自然性和最小距离的文本对，并构建了一个由4008个实例分成1002个四重组成的Counter-GAP注释数据集，以评估语言模型在固指消解中的性别偏见问题。作者使用四重组级别指标解决了以前的偏差取消问题，并发现四个预训练的语言模型在不同性别组之间的不一致性显着大于在每个组内部的不一致性，姓名为基础的反事实数据增强方法比匿名化方法对减少这种偏见更有效。

通过性别不明确的代词进行反事实偏见评估的计数器