BriefGPT.xyz
Oct, 2024
ReIFE:重新评估指令遵循评价
ReIFE: Re-evaluating Instruction-Following Evaluation
HTML
PDF
Yixin Liu, Kejian Shi, Alexander R. Fabbri, Yilun Zhao, Peifeng Wang...
TL;DR
本文针对当前自动指令遵循评价中的问题,尤其是关于大型语言模型(LLMs)评估者的全面性不足展开研究。我们通过对25个基础LLMs和15个评估协议进行详尽的元评估,揭示了最佳表现的基础LLMs和评估协议,从而为未来的研究提供了系统性支持。
Abstract
The automatic evaluation of
Instruction Following
typically involves using
Large Language Models
(LLMs) to assess response quality. However, there is a lack of comprehensive evaluation of these LLM-based evaluato
→