BriefGPT.xyz
Jun, 2024
思维周期: 通过稳定解释衡量LLM的信心
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
HTML
PDF
Evan Becker, Stefano Soatto
TL;DR
通过使用解释蕴涵作为分类器可能性,我们提出了一种框架来测量语言模型不确定性,以改善置信度指标(AURC和AUROC)。
Abstract
In many high-risk
machine learning
applications it is essential for a model to indicate when it is uncertain about a prediction. While large
language models
(LLMs) can reach and even surpass human-level accuracy
→