Reporting and providing test sets for harmful bias in NLP applications is essential for building a robust understanding of the current problem. We present a new observation of gender bias in a downstream NLP application: marked attribute bias in natural language inference. Bias in downstream applications can stem from training data, word embeddings, or be amplified by the model in use. However, focusing on biased word embeddings is potentially the most impactful first step due to their universal nature. Here we seek to understand how the intrinsic properties of word embeddings contribute to this observed marked attribute effect, and whether current post-processing methods address the bias successfully. An investigation of the current debiasing landscape reveals two open problems: none of the current debiased embeddings mitigate the marked attribute error, and none of the intrinsic bias measures are predictive of the marked attribute effect. By noticing that a new type of intrinsic bias measure correlates meaningfully with the marked attribute effect, we propose a new postprocessing debiasing scheme for static word embeddings. The proposed method applied to existing embeddings achieves new best results on the marked attribute bias test set. See https://github.com/hillary-dawkins/MAB.

对 NLP 应用中有害偏差进行报告和提供测试集对于建立对当前问题的强大理解至关重要。通过观察新的标记属性偏差，我们总结了某些向下 NLP 应用中的性别偏差。针对有偏差的词嵌入可能是消除偏差的最具影响力的第一步。通过研究嵌入词汇的内在属性如何促成这种观察到的标记属性效应，以及当前的后处理方法是否成功解决偏差问题，调查了当前的去偏思路，发现下述两个问题：当前的去偏嵌入方案中没有一个能够缓解标记属性误差，也没有一个内在偏差测量能够预测标记属性影响。通过发现一种新类型的内在偏差测量与标记属性效应有意义的相关性，我们提出了针对静态词嵌入的新的后处理去偏方法，将提出的方法应用于现有嵌入中，取得了标记属性偏差测试集上的最佳结果。

自然语言推理中的标记属性偏差