BriefGPT.xyz
Nov, 2023
社会偏见探测:语言模型的公平性基准测试
Social Bias Probing: Fairness Benchmarking for Language Models
HTML
PDF
Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti, Isabelle Augenstein
TL;DR
本研究提出了一种新的框架,用于探索语言模型中的社会偏见,通过采集探索数据集和利用一种新的公平性评分方法,发现语言模型中的偏见更加复杂,并揭示不同宗教身份导致各种模型中最明显的不平等处理。
Abstract
Large
language models
have been shown to encode a variety of
social biases
, which carries the risk of downstream harms. While the impact of these biases has been recognized, prior methods for
→