BriefGPT.xyz
May, 2024
AI对准逆悖论
There and Back Again: The AI Alignment Paradox
HTML
PDF
Robert West, Roland Aydin
TL;DR
AI对齐存在悖论:我们越好地将AI模型与我们的价值观相一致,就越容易让对手使模型不一致。为确保人类福祉,必须确保广泛研究者共同意识到AI对齐悖论,并努力寻求突破途径。
Abstract
The field of
ai alignment
aims to steer AI systems toward
human goals
, preferences, and
ethical principles
. Its contributions have been in
→