Socio-linguistic indicators of text, such as emotion or sentiment, are often
extracted using neural networks in order to better understand features of
social media. One indicator that is often overlooked, however, is the presence
of hazards within text. Recent psychological research suggests that statements
about hazards are more believable than statements about benefits (a property
known as negatively biased credulity), and that political liberals and
conservatives differ in how often they share hazards. Here, we develop a new
model to detect information concerning hazards, trained on a new collection of
annotated X posts, as well as urban legends annotated in previous work. We show
that not only does this model perform well (outperforming, e.g., zero-shot
human annotator proxies, such as GPT-4) but that the hazard information it
extracts is not strongly correlated with other indicators, namely moral
outrage, sentiment, emotions, and threat words. (That said, consonant with
expectations, hazard information does correlate positively with such emotions
as fear, and negatively with emotions like joy.) We then apply this model to
three datasets: X posts about COVID-19, X posts about the 2023 Hamas-Israel
war, and a new expanded collection of urban legends. From these data, we
uncover words associated with hazards unique to each dataset as well as
differences in this language between groups of users, such as conservatives and
liberals, which informs what these groups perceive as hazards. We further show
that information about hazards peaks in frequency after major hazard events,
and therefore acts as an automated indicator of such events. Finally, we find
that information about hazards is especially prevalent in urban legends, which
is consistent with previous work that finds that reports of hazards are more
likely to be both believed and transmitted.

通过使用神经网络从文本中提取社会语言学指标，研究证明，有关危险的陈述比有关好处的陈述更具可信度，政治自由派和保守派在分享危险信息的频率上存在差异，并且危险信息与道德愤怒、情绪、威胁词等指标相关，同时与恐惧情绪呈正相关、与喜悦情绪呈负相关。利用新的标注数据集和已有的标注数据集训练了一个新模型以检测与危险相关的信息，并应用于 COVID-19 相关帖子、2023 年哈马斯 - 以色列战争相关帖子和扩展的城市传奇集合数据，发现不同数据集中与危险相关的词汇以及政治派别之间的差异，该信息也作为重大危险事件的自动指示器。最后，发现城市传奇中关于危险的信息尤为普遍，与之前的研究结果一致，危险的报道更容易被人们相信和传播。

信任与恐怖：文本中显露出的负面偏信和党派消极偏见

Trust and Terror: Hazards in Text Reveal Negatively Biased Credulity and  Partisan Negativity Bias

Public release of the weights of pretrained foundation models, otherwise
known as downloadable access \citep{solaiman_gradient_2023}, enables
fine-tuning without the prohibitive expense of pretraining. Our work argues
that increasingly accessible fine-tuning of downloadable models may increase
hazards. First, we highlight research to improve the accessibility of
fine-tuning. We split our discussion into research that A) reduces the
computational cost of fine-tuning and B) improves the ability to share that
cost across more actors. Second, we argue that increasingly accessible
fine-tuning methods may increase hazard through facilitating malicious use and
making oversight of models with potentially dangerous capabilities more
difficult. Third, we discuss potential mitigatory measures, as well as benefits
of more accessible fine-tuning. Given substantial remaining uncertainty about
hazards, we conclude by emphasizing the urgent need for the development of
mitigations.

能够下载的预训练模型权重的公开发布，使得细调模型可以避免昂贵的预训练费用。本研究认为，可下载模型的越来越易于细调可能会增加风险，主要体现在降低细调的计算成本、扩大参与共享成本的参与者范围、容易被用于恶意目的、难以监管具有潜在危险能力的模型等方面。因此，我们急切需要开发缓解措施。

可下载基础模型微调不断增加的风险

Hazards from Increasingly Accessible Fine-Tuning of Downloadable  Foundation Models

This chapter formulates seven lessons for preventing harm in artificial
intelligence (AI) systems based on insights from the field of system safety for
software-based automation in safety-critical domains. New applications of AI
across societal domains and public organizations and infrastructures come with
new hazards, which lead to new forms of harm, both grave and pernicious. The
text addresses the lack of consensus for diagnosing and eliminating new AI
system hazards. For decades, the field of system safety has dealt with
accidents and harm in safety-critical systems governed by varying degrees of
software-based automation and decision-making. This field embraces the core
assumption of systems and control that AI systems cannot be safeguarded by
technical design choices on the model or algorithm alone, instead requiring an
end-to-end hazard analysis and design frame that includes the context of use,
impacted stakeholders and the formal and informal institutional environment in
which the system operates. Safety and other values are then inherently
socio-technical and emergent system properties that require design and control
measures to instantiate these across the technical, social and institutional
components of a system. This chapter honors system safety pioneer Nancy
Leveson, by situating her core lessons for today's AI system safety challenges.
For every lesson, concrete tools are offered for rethinking and reorganizing
the safety management of AI systems, both in design and governance. This
history tells us that effective AI safety management requires transdisciplinary
approaches and a shared language that allows involvement of all levels of
society.

本章提出了七个防止人工智能系统造成伤害的课程，探讨了系统安全领域的见解，针对公共机构及基础设施中的新应用程序引发的新危害，阐述了诊断和消除新人工智能系统危害的共识缺失，并探讨了有效人工智能管理所需的跨学科方法和共享语言。