BriefGPT.xyz
Jan, 2024
通过不确定性量化对LLMs进行基准测试
Benchmarking LLMs via Uncertainty Quantification
HTML
PDF
Fanghua Ye, Mingming Yang, Jianhui Pang, Longyue Wang, Derek F. Wong...
TL;DR
通过整合不确定性量化的新型基准评估方法,本研究发现:准确性较高的大型语言模型可能显示出较低的确定性,较大规模的语言模型可能与较小规模的模型相比具有更大的不确定性,指令微调倾向于增加语言模型的不确定性。这些结果强调了在语言模型评估中整合不确定性的重要性。
Abstract
The proliferation of
open-source large language models
(LLMs) from various institutions has highlighted the urgent need for comprehensive
evaluation methods
. However, current evaluation platforms, such as the wid
→