Trained classification models can unintentionally lead to biased
representations and predictions, which can reinforce societal preconceptions
and stereotypes. Existing debiasing methods for classification models, such as
adversarial training, are often expensive to train and difficult
本文提出了一种名为 DualFair 的自我监督模型,可从学到的表示中去除诸如性别和种族等敏感属性的偏差,同时优化两个公平标准,团体公平性和反事实公平性,为团体和个体提供更公平的预测,针对多个数据集进行了详细的分析,表明了该模型的有效性和进一步展示了同时解决两种公平标准的协同效应,同时建议该模型在公平的智能 Web 应用中具有潜在价值。