BriefGPT.xyz
Mar, 2024
通过一致对齐提升大型语言模型的鲁棒性
Improving the Robustness of Large Language Models via Consistency Alignment
HTML
PDF
Zhao Yukun, Yan Lingyong, Sun Weiwei, Xing Guoliang, Wang Shuaiqiang...
TL;DR
定义了指令不一致问题并提出了两阶段训练框架,在第一阶段通过相似指令增强帮助模型跟随指令,第二阶段通过区分相似回应中微小差异来提高模型的多样性和人类期望的一致性,并通过自奖励训练过程来验证该框架的有效性。
Abstract
large language models
(LLMs) have shown tremendous success in following user
instructions
and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significa
→