BriefGPT.xyz
Aug, 2023
透视偏好:解开大型语言模型对齐中的反馈获取
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
HTML
PDF
Hritik Bansal, John Dang, Aditya Grover
TL;DR
通过稀疏反馈的设计选择以及反馈协议对大型语言模型(LLMs)的对齐和评估进行分析,发现评分和排名所推断的偏好在人类和人工智能注释者中有显著差异,并揭示了对齐LLMs评估的方法中的关键缺陷和对反馈协议的强烈依赖。
Abstract
Aligning
large language models
(LLMs) with
human values
and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire and integrate, sparse feedback pr
→