BriefGPT.xyz
Aug, 2024
超越人工智能对齐中的偏好
Beyond Preferences in AI Alignment
HTML
PDF
Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton
TL;DR
本研究针对传统人工智能对齐方法所面临的问题进行探讨,指出偏好并不足以全面体现人类价值观。论文提出了一种新的对齐框架,强调人工智能系统应依据适当的社会角色规范标准进行对齐,以促进各利益相关者之间的协商,进而服务于多样化的目标,减少潜在的伤害。
Abstract
The dominant practice of
AI Alignment
assumes (1) that preferences are an adequate representation of
Human Values
, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferenc
→