Pretrained language models (PLMs), such as GPT2, have achieved remarkable empirical performance in text generation tasks. However, pretrained on large-scale natural language corpora, the generated text from PLMs may exhibit social bias against disadvantaged demographic groups. To improve the fairness of PLMs in text generation, we propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity, i.e., the demographic group to which the sentence is referring. In this way, the mentioning of a demographic group (e.g., male or female) is encouraged to be independent from how it is described in the generated text, thus effectively alleviating the social bias. Moreover, we propose to efficiently estimate the upper bound of the above mutual information via importance sampling, leveraging a natural language corpus. We also propose a distillation mechanism that preserves the language modeling ability of the PLMs after debiasing. Empirical results on real-world benchmarks demonstrate that the proposed method yields superior performance in term of both fairness and language modeling ability.

为改善PLM在文本生成中的社会偏见，本文提出通过最小化生成文本中的语义与社会偏好之间的互信息，使文本生成中的人口群体提及与其在文本中的描述独立，缓解社会偏见，并通过重要性采样有效估计互信息的上界，最后通过精馏机制将消除偏差后的PLM保留其语言建模能力，实验结果表明该方法在公平性和语言建模能力方面具有卓越的性能。

基于重要性采样的互信息最小化实现文本生成公平性