BriefGPT.xyz
Feb, 2023
目标一致性:价值对齐问题的人类意识解释
Goal Alignment: A Human-Aware Account of Value Alignment Problem
HTML
PDF
Malek Mechergui, Sarath Sreedharan
TL;DR
AI中的价值对齐问题源于AI代理的指定目标与其用户的真正基础目标不匹配。本文提出了一种名为目标对齐的新价值对齐问题公式,并提出了一种交互式算法,用于确定用户的真正基础目标。
Abstract
value alignment
problems arise in scenarios where the specified objectives of an
ai agent
don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety
→