BriefGPT.xyz
Oct, 2024
大语言模型评估中的黑箱不确定性量化方法
Black-box Uncertainty Quantification Method for LLM-as-a-Judge
HTML
PDF
Nico Wagner, Michael Desmond, Rahul Nair, Zahra Ashktorab, Elizabeth M. Daly...
TL;DR
本研究解决了在大语言模型(LLM)评估中量化不确定性的问题,尤其是LLM-as-a-Judge方法的应用挑战。我们提出了一种新颖的方法,通过分析生成评估与可能评分之间的关系来量化不确定性,证明了该方法与评估准确性之间的强相关性,有助于提升LLM评估的可靠性和一致性。
Abstract
LLM-as-a-Judge is a widely used method for evaluating the performance of
Large Language Models
(LLMs) across various tasks. We address the challenge of quantifying the uncertainty of LLM-as-a-Judge evaluations. While
Un
→