BriefGPT.xyz
Nov, 2024
更强的模型并不是更强的教学者:对指令调优的反思
Stronger Models are NOT Stronger Teachers for Instruction Tuning
HTML
PDF
Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Radha Poovendran
TL;DR
本研究针对指令调优领域的一个普遍假设进行探讨,即较大或更强的模型是较小模型的更强教学者。通过对多个模型和响应生成器的广泛实验,研究发现此假设并不成立,并提出了一种新颖的度量标准“兼容性调整奖励(CAR)”,能够更准确地评估响应生成器的效果,实验结果表明CAR优于几乎所有基线指标。
Abstract
Instruction Tuning
has been widely adopted to ensure
Large Language Models
(LLMs) follow user instructions effectively. The resulting instruction-following capabilities of LLMs heavily rely on the instruction dat
→