利用自然语句理解语言模型中的偏见

May, 2022

利用自然语句理解语言模型中的偏见

Using Natural Sentences for Understanding Biases in Language Models

Sarah Alnegheimish, Alicia Guo, Yi Sun

TL;DR本文通过创建一个基于职业的自然句子语料库来评估语言模型上的偏差，与以往只使用合成数据集的研究方法有所不同，证明使用基于自然句子的提示会比基于预设模板的提示更为准确和系统化地评估性别-职业偏差。

Abstract

Evaluation of biases in language models is often limited to synthetically generated datasets. This dependence traces back to the need for a prompt-style dataset to trigger specific behaviors of →