BriefGPT.xyz
Jun, 2024
级联奖励采样用于高效解码时间对齐
Cascade Reward Sampling for Efficient Decoding-Time Alignment
HTML
PDF
Bolian Li, Yifan Wang, Ananth Grama, Ruqi Zhang
TL;DR
通过级联奖励采样(CARDS)技术,可以在高效并且成本低的情况下生成既有高奖励又符合高似然概率的文本,大幅提升生成效率和对齐评分。
Abstract
Aligning
large language models
(LLMs) with human preferences is critical for their deployment. Recently,
decoding-time alignment
has emerged as an effective plug-and-play technique that requires no fine-tuning of
→