Text representations learned by machine learning models often encode undesirable demographic information of the user. Predictive models based on these representations can rely on such information resulting in biased decisions. We present a novel debiasing technique Fairness-aware Rate Maximization (FaRM), that removes demographic information by making representations of instances belonging to the same protected attribute class uncorrelated using the rate-distortion function. FaRM is able to debias representations with or without a target task at hand. FaRM can also be adapted to simultaneously remove information about multiple protected attributes. Empirical evaluations show that FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.

本文提出了一种新的去偏方法——公平感知率最大化（FaRM），该方法能通过使用率失真函数使属于同一个受保护属性类的实例的表示不相关，从而去除受保护的信息，其能够在有或没有目标任务的情况下去偏表示。经实验评估表明，FaRM在多个数据集上实现了最新的性能，并且学习到的表示对非线性探测网络的攻击泄漏了极少量的受保护属性信息。

通过率失真最大化学习公平表示