BriefGPT.xyz
Jun, 2024
BiGGen Bench:一种用于精细评估语言模型的基准
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
HTML
PDF
Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim...
TL;DR
通过BiGGen Bench的引入,对77个不同任务中的九种语言模型的生成能力进行了全面评估,并借助实例特定的评估标准来模拟人类评估的微妙辨别。该研究公开提供了代码、数据和评估结果。
Abstract
As
language models
(LMs) become capable of handling a wide range of tasks, their
evaluation
is becoming as challenging as their development. Most generation
→