AI对准逆悖论

May, 2024

There and Back Again: The AI Alignment Paradox

Robert West, Roland Aydin

TL;DRAI对齐存在悖论：我们越好地将AI模型与我们的价值观相一致，就越容易让对手使模型不一致。为确保人类福祉，必须确保广泛研究者共同意识到AI对齐悖论，并努力寻求突破途径。

Abstract

The field of ai alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been in