BriefGPT.xyz
Feb, 2024
tinyBenchmarks: 用较少的样例评估LLM
tinyBenchmarks: evaluating LLMs with fewer examples
HTML
PDF
Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu...
TL;DR
通过研究LLM在各种关键基准测试中的表现,我们探索了减少LLM性能评估所需评估次数的策略,并发布了评估工具和微型基准测试,证明这些工具和测试足以可靠高效地复现原始评估结果。
Abstract
The versatility of
large language models
(
llms
) led to the creation of diverse
benchmarks
that thoroughly test a variety of language model
→