BriefGPT.xyz
Oct, 2023
提升指令遵循评估能力的研究:以摘要为例的案例研究
Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization
HTML
PDF
Ondrej Skopek, Rahul Aralikatte, Sian Gooding, Victor Carbune
TL;DR
通过对大型语言模型的指令遵循能力进行度量的多种度量方法的元评估,分析评估方法与人工判断之间的一致性,并提出基于LLM的无参考评估方法,改进了传统基准,并达到了要求高质量摘要的昂贵基于参考文献的指标的效果。
Abstract
Despite recent advances, evaluating how well
large language models
(LLMs) follow user instructions remains an open problem. While
evaluation methods
of language models have seen a rise in prompt-based approaches,
→