BriefGPT.xyz
Mar, 2025
BiasEdit:通过模型编辑消除刻板印象语言模型的偏见
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
HTML
PDF
Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley
TL;DR
本研究解决了语言模型中存在的刻板印象偏见问题,提出了BiasEdit这一高效的模型编辑方法。通过轻量网络生成参数更新,该方法有效地消除了偏见,同时保持了语言模型的语言生成能力。实验结果表明,BiasEdit在消除偏见方面表现优越,对模型整体能力影响较小。
Abstract
Previous studies have established that
Language Models
manifest stereotyped biases. Existing
Debiasing
strategies, such as retraining a model with counterfactual data, representation projection, and prompting oft
→