Robustness is of central importance in machine learning and has given rise to the fields of domain generalization and invariant learning, which are concerned with improving performance on a test distribution distinct from but related to the training distribution. In light of recent work suggesting an intimate connection between fairness and robustness, we investigate whether algorithms from robust ML can be used to improve the fairness of classifiers that are trained on biased data and tested on unbiased data. We apply Invariant Risk Minimization (IRM), a domain generalization algorithm that employs a causal discovery inspired method to find robust predictors, to the task of fairly predicting the toxicity of internet comments. We show that IRM achieves better out-of-distribution accuracy and fairness than Empirical Risk Minimization (ERM) methods, and analyze both the difficulties that arise when applying IRM in practice and the conditions under which IRM will likely be effective in this scenario. We hope that this work will inspire further studies of how robust machine learning methods relate to algorithmic fairness.

本文关注于通过鲁棒性机器学习算法，提高在有偏数据集上训练、在无偏数据集上测试的分类器的公平性和鲁棒性。其中，作者使用了一种称为“不变风险最小化（IRM）”的领域泛化算法，并将其应用于公正预测互联网评论的毒性。作者发现，IRM算法在提高分类器的公平性和超出分布准确性方面，优于经验风险最小化方法。

不变学习中的公平性和鲁棒性: 毒性分类的案例研究