GPT模型中存在对穆斯林暴力的偏见

Oct, 2023

GPT模型中存在对穆斯林暴力的偏见

Muslim-Violence Bias Persists in Debiased GPT Models

Babak Hemmatian, Razan Baltaji, Lav R. Varshney

TL;DRGPT-3存在针对穆斯林的暴力生成倾向和反穆斯林偏见，复制实验表明去偏置措施在新模型中不再有效，加强高级关联的去偏置需求。

Abstract

Abid et al. (2021) showed a tendency in gpt-3 to generate violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and