融合评估器与LLMs：Fusion-Eval

Nov, 2023

Fusion-Eval: Integrating Evaluators with LLMs

Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu...

TL;DR利用大型语言模型进行评估的新方法“Fusion-Eval”在SummEval数据集上取得了0.96的Spearman相关性，超过了其他评估方法，在LLM评估领域树立了新的标准。

Abstract

evaluating large language models (LLMs) is a complex task, especially considering the intricacies of natural language understanding and the expectations for high-level reasoning. Traditional evaluations typically