BriefGPT.xyz
Nov, 2023
多重奖励提炼的个性化自盈利者设计
Tailoring Self-Rationalizers with Multi-Reward Distillation
HTML
PDF
Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li...
TL;DR
该论文介绍了一种名为MaRio的算法,该算法可以使规模较小的语言模型(约为GPT-3的1/200)生成合理、多样且一致的自我理解解释,从而提高问题回答的准确性和自我理解质量,并通过人工评估验证了MaRio方案的可行性。
Abstract
Large
language models
(
lms
) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful
→