BriefGPT.xyz
May, 2023
对齐大型语言模型中奖励崩溃的研究
Reward Collapse in Aligning Large Language Models
HTML
PDF
Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su
TL;DR
本研究旨在解决大型语言模型训练时出现的奖惩分布坍塌问题,提出了一种基于Prompt-Aware优化方案的解决方法,使得奖惩可以更好地区分不同的问句。
Abstract
The extraordinary capabilities of
large language models
(LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with
reward models
that are trained on human preferences, which are often represente
→