Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs.

通过对现有和附加的数据集进行系统评估公平性、特异性和泛化性的互补度量，本文首先建立了一个新的偏差缓解基准BiasKE。同时，我们提出了一种新颖的偏差缓解方法FAST，通过对个体偏见知识进行细粒度校准，实现可编辑的公平性。全面的实验证明，FAST在保留知识的整体模型能力的同时，优于现有技术基线，具有显著的偏差缓解性能，突出了LLM中可编辑公平性的细粒度偏差缓解策略的前景。

大型语言模型偏见缓解的知识编辑视角