BriefGPT.xyz
Aug, 2024
REFINE-LM: 通过强化学习减轻语言模型的刻板偏见
REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning
HTML
PDF
Rameez Qureshi, Naïm Es-Sebbani, Luis Galárraga, Yvette Graham, Miguel Couceiro...
TL;DR
本文研究了大型语言模型所继承的意外偏见,尤其是性别、地域和种族刻板印象。提出了一种名为REFINE-LM的去偏见方法,通过强化学习处理不同类型的偏见,无需细化训练,实验表明该方法能够显著减少刻板偏见,同时保持模型性能且训练成本低。
Abstract
With the introduction of (large)
Language Models
, there has been significant concern about the unintended bias such models may inherit from their training data. A number of studies have shown that such models propagate gender
→