BriefGPT.xyz
Mar, 2024
人类价值是什么,我们如何使人工智能与之相吻合?
What are human values, and how do we align AI to them?
HTML
PDF
Oliver Klingefjord, Ryan Lowe, Joe Edelman
TL;DR
通过道德图表法,本文研究如何合成不同的人类价值观输入,以对齐语言模型的行为,并通过在500名代表性美国人身上试验证明了其有效性。
Abstract
There is an emerging consensus that we need to align
ai systems
with
human values
(Gabriel, 2020; Ji et al., 2024), but there is very little work on what that means and how we actually do it. We split the problem
→