As AI systems are increasingly incorporated into domains where human behavior
has set the norm, a challenge for AI governance and AI alignment research is to
regulate their behavior in a way that is useful and constructive for society.
One way to answer this question is to ask: how do we govern the human behavior
that the models are emulating? To evaluate human behavior, the American legal
system often uses the "Reasonable Person Standard." The idea of "reasonable"
behavior comes up in nearly every area of law. The legal system often judges
the actions of parties with respect to what a reasonable person would have done
under similar circumstances. This paper argues that the reasonable person
standard provides useful guidelines for the type of behavior we should develop,
probe, and stress-test in models. It explains how reasonableness is defined and
used in key areas of the law using illustrative cases, how the reasonable
person standard could apply to AI behavior in each of these areas and contexts,
and how our societal understanding of "reasonable" behavior provides useful
technical goals for AI researchers.

人工智能治理和人工智能对齐研究的一个挑战是以一种对社会有用和建设性的方式规范其行为，而合理人标准提供了对我们在模型中开发、测试和强调的行为类型的有用指导，并且解释了合理性在关键领域的定义和用法，以及合理行为的社会理解为 AI 研究人员提供了有益的技术目标。

AI 的合理人标准

The Reasonable Person Standard for AI

AI Alignment research seeks to align human and AI goals to ensure independent
actions by a machine are always ethical. This paper argues empathy is necessary
for this task, despite being often neglected in favor of more deductive
approaches. We offer an inside-out approach that grounds morality within the
context of the brain as a basis for algorithmically understanding ethics and
empathy. These arguments are justified via a survey of relevant literature. The
paper concludes with a suggested experimental approach to future research and
some initial experimental observations.

AI 对齐研究旨在保证机器的独立行为始终符合伦理，本文认为尽管常常被忽视，但共情对此任务而言是必要的，我们提出了一个从内而外的方法，通过将道德放置于脑的背景下作为算法理解伦理和共情的基础，通过对相关文献的调查，证明了这些论证，文章以建议未来研究的实验方法和一些初步实验观察作为结论。

脑到机器的共情交流作为一种价值对齐策略

Cross Fertilizing Empathy from Brain to Machine as a Value Alignment  Strategy

AI alignment research is the field of study dedicated to ensuring that
artificial intelligence (AI) benefits humans. As machine intelligence gets more
advanced, this research is becoming increasingly important. Researchers in the
field share ideas across different media to speed up the exchange of
information. However, this focus on speed means that the research landscape is
opaque, making it difficult for young researchers to enter the field. In this
project, we collected and analyzed existing AI alignment research. We found
that the field is growing quickly, with several subfields emerging in parallel.
We looked at the subfields and identified the prominent researchers, recurring
topics, and different modes of communication in each. Furthermore, we found
that a classifier trained on AI alignment research articles can detect relevant
articles that we did not originally include in the dataset. We are sharing the
dataset with the research community and hope to develop tools in the future
that will help both established researchers and young researchers get more
involved in the field.

通过分析现有的 AI 对齐研究，我们发现领域正在迅速发展，并出现了几个子领域。我们查看了子领域并确定了突出的研究人员、经常出现的话题和每种不同的交流方式。此外，我们发现，一个分类器可以检测出 AI 对齐研究文章中没有最初包含在数据集中的相关文章。我们希望向研究社区共享数据集，并希望未来能够开发工具来帮助既有研究人员和年轻的研究人员更多地参与这一领域。