BriefGPT.xyz
Feb, 2025
基于原则引导解码的即时偏好对齐
On-the-fly Preference Alignment via Principle-Guided Decoding
HTML
PDF
Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao
TL;DR
本研究解决了大型语言模型生成与人类价值观及偏好对齐的效率低下问题。我们提出了一种即时偏好对齐方法,通过原则引导的解码在推理过程中直接调整模型输出,避免了大规模训练数据和计算资源的需求。实验结果表明,该方法在一般和个性化对齐任务中表现出色,展现了其高效性和有效性。
Abstract
With the rapidly expanding landscape of large
Language Models
, aligning model generations with
Human Values
and preferences is becoming increasingly important. Popular alignment methods, such as
→