To effectively tackle sexism online, research has focused on automated methods for detecting sexism. In this paper, we use items from psychological scales and adversarial sample generation to 1) provide a codebook for different types of sexism in theory-driven scales and in social media text; 2) test the performance of different sexism detection methods across multiple data sets; 3) provide an overview of strategies employed by humans to remove sexism through minimal changes. Results highlight that current methods seem inadequate in detecting all but the most blatant forms of sexism and do not generalize well to out-of-domain examples. By providing a scale-based codebook for sexism and insights into what makes a statement sexist, we hope to contribute to the development of better and broader models for sexism detection, including reflections on theory-driven approaches to data collection.

本文提出用基于心理学测量中的不同维度将性别歧视划分的代码书以及该代码书在社交媒体上标注现有和新的数据集的应用程序，以生成对抗性示例并测试当前机器学习模型在性别歧视检测方面的可靠性。虽然现有模型只能识别有限的语言标记，但包括多元化数据和对抗性示例在训练期间可改善模型广泛性和鲁棒性。 

叫我性别歧视者，但是...": 使用心理学量表和对抗样本重新审视性别歧视检测