The advent of generative artificial intelligence and the widespread adoption
of it in society engendered intensive debates about its ethical implications
and risks. These risks often differ from those associated with traditional
discriminative machine learning. To synthesize the recent discourse and map its
normative concepts, we conducted a scoping review on the ethics of generative
artificial intelligence, including especially large language models and
text-to-image models. Our analysis provides a taxonomy of 378 normative issues
in 19 topic areas and ranks them according to their prevalence in the
literature. The study offers a comprehensive overview for scholars,
practitioners, or policymakers, condensing the ethical debates surrounding
fairness, safety, harmful content, hallucinations, privacy, interaction risks,
security, alignment, societal impacts, and others. We discuss the results,
evaluate imbalances in the literature, and explore unsubstantiated risk
scenarios.

通过分析 378 个伦理问题的 19 个主题领域，本研究综述了生成人工智能伦理问题的分类和排名，主要关注大型语言模型和图像生成模型，为学者、从业人员和政策制定者提供了关于公平性、安全性、有害内容、幻觉、隐私、交互风险、安全性、社会影响等伦理争论的全面概述，并讨论了结果、评估了文献中的不平衡现象，并探讨了未经证实的风险场景。

生成式人工智能伦理的映射：一项全面的范围审查

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

We test the hypothesis that language models trained with reinforcement
learning from human feedback (RLHF) have the capability to "morally
self-correct" -- to avoid producing harmful outputs -- if instructed to do so.
We find strong evidence in support of this hypothesis across three different
experiments, each of which reveal different facets of moral self-correction. We
find that the capability for moral self-correction emerges at 22B model
parameters, and typically improves with increasing model size and RLHF
training. We believe that at this level of scale, language models obtain two
capabilities that they can use for moral self-correction: (1) they can follow
instructions and (2) they can learn complex normative concepts of harm like
stereotyping, bias, and discrimination. As such, they can follow instructions
to avoid certain kinds of morally harmful outputs. We believe our results are
cause for cautious optimism regarding the ability to train language models to
abide by ethical principles.

通过三个实验，我们得出结论：通过人类反馈训练的强化学习语言模型具有道德自我纠正的能力，具备遵守伦理原则的潜力。