The rapid deployment of generative language models (LMs) has raised concerns
about social biases affecting the well-being of diverse consumers. The extant
literature on generative LMs has primarily examined bias via explicit identity
prompting. However, prior research on bias in earlier language-based technology
platforms, including search engines, has shown that discrimination can occur
even when identity terms are not specified explicitly. Studies of bias in LM
responses to open-ended prompts (where identity classifications are left
unspecified) are lacking and have not yet been grounded in end-consumer harms.
Here, we advance studies of generative LM bias by considering a broader set of
natural use cases via open-ended prompting. In this "laissez-faire" setting, we
find that synthetically generated texts from five of the most pervasive LMs
(ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of
omission, subordination, and stereotyping for minoritized individuals with
intersectional race, gender, and/or sexual orientation identities (AI/AN,
Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find
widespread evidence of bias to an extent that such individuals are hundreds to
thousands of times more likely to encounter LM-generated outputs that portray
their identities in a subordinated manner compared to representative or
empowering portrayals. We also document a prevalence of stereotypes (e.g.
perpetual foreigner) in LM-generated outputs that are known to trigger
psychological harms that disproportionately affect minoritized individuals.
These include stereotype threat, which leads to impaired cognitive performance
and increased negative self-perception. Our findings highlight the urgent need
to protect consumers from discriminatory harms caused by language models and
invest in critical AI education programs tailored towards empowering diverse
consumers.

通过开放式提示，我们发现模型产生的文本在描绘边缘群体的身份时存在错误、隐含和刻板印象的问题，这些问题可能导致心理伤害和认知能力下降。

自由放任的危害：生成式语言模型中的算法偏差

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

We present a framework for the automated measurement of responsible AI (RAI)
metrics for large language models (LLMs) and associated products and services.
Our framework for automatically measuring harms from LLMs builds on existing
technical and sociotechnical expertise and leverages the capabilities of
state-of-the-art LLMs, such as GPT-4. We use this framework to run through
several case studies investigating how different LLMs may violate a range of
RAI-related principles. The framework may be employed alongside domain-specific
sociotechnical expertise to create measurements for new harm areas in the
future. By implementing this framework, we aim to enable more advanced harm
measurement efforts and further the responsible use of LLMs.

我们提出了一个框架来自动化测量大型语言模型（LLMs）和相关产品与服务的负责任人工智能（RAI）指标。该框架基于现有的技术和社会技术专业知识，并利用了最先进的 LLMs（如 GPT-4）的能力来自动测量 LLMs 可能违反一系列 RAI 相关原则的伤害。该框架可以与领域特定的社会技术专业知识结合使用，以针对未来的新伤害领域创建测量。通过实施该框架，我们旨在推动更高级的伤害测量工作，并进一步促进 LLMs 的负责任使用。

生成 AI 应用中负责任 AI 危害自动测量框架

A Framework for Automated Measurement of Responsible AI Harms in  Generative AI Applications

Recent studies show that Natural Language Processing (NLP) technologies
propagate societal biases about demographic groups associated with attributes
such as gender, race, and nationality. To create interventions and mitigate
these biases and associated harms, it is vital to be able to detect and measure
such biases. While existing works propose bias evaluation and mitigation
methods for various tasks, there remains a need to cohesively understand the
biases and the specific harms they measure, and how different measures compare
with each other. To address this gap, this work presents a practical framework
of harms and a series of questions that practitioners can answer to guide the
development of bias measures. As a validation of our framework and
documentation questions, we also present several case studies of how existing
bias measures in NLP -- both intrinsic measures of bias in representations and
extrinsic measures of bias of downstream applications -- can be aligned with
different harms and how our proposed documentation questions facilitates more
holistic understanding of what bias measures are measuring.

该研究提出了有关 NLP 技术中有关社会偏见的框架和一系列问题，并且通过几个案例研究来验证了我们的框架和记录问题。

自然语言处理中偏见和伤害的度量

On Measures of Biases and Harms in NLP

Algorithmic fairness aims to address the economic, moral, social, and
political impact that digital systems have on populations through solutions
that can be applied by service providers. Fairness frameworks do so, in part,
by mapping these problems to a narrow definition and assuming the service
providers can be trusted to deploy countermeasures. Not surprisingly, these
decisions limit fairness frameworks' ability to capture a variety of harms
caused by systems.
We characterize fairness limitations using concepts from requirements
engineering and from social sciences. We show that the focus on algorithms'
inputs and outputs misses harms that arise from systems interacting with the
world; that the focus on bias and discrimination omits broader harms on
populations and their environments; and that relying on service providers
excludes scenarios where they are not cooperative or intentionally adversarial.
We propose Protective Optimization Technologies (POTs). POTs provide means
for affected parties to address the negative impacts of systems in the
environment, expanding avenues for political contestation. POTs intervene from
outside the system, do not require service providers to cooperate, and can
serve to correct, shift, or expose harms that systems impose on populations and
their environments. We illustrate the potential and limitations of POTs in two
case studies: countering road congestion caused by traffic-beating
applications, and recalibrating credit scoring for loan applicants.

本文对算法公平实现的局限性进行了分析，并提出了一种保护性优化技术 (POTs) 的解决方案，该方案可以扩大政治争议的渠道，并可以纠正、转移或暴露系统对人口及其环境造成的危害。