Large Language Models (LLMs), such as ChatGPT, have achieved impressive milestones in natural language processing (NLP). Despite their impressive performance, the models are known to pose important risks. As these models are deployed in real-world applications, a systematic understanding of different risks posed by these models on tasks such as natural language inference (NLI), is much needed. In this paper, we define and formalize two distinct types of risk: decision risk and composite risk. We also propose a risk-centric evaluation framework, and four novel metrics, for assessing LLMs on these risks in both in-domain and out-of-domain settings. Finally, we propose a risk-adjusted calibration method called DwD for helping LLMs minimize these risks in an overall NLI architecture. Detailed experiments, using four NLI benchmarks, three baselines and two LLMs, including ChatGPT, show both the practical utility of the evaluation framework, and the efficacy of DwD in reducing decision and composite risk. For instance, when using DwD, an underlying LLM is able to address an extra 20.1% of low-risk inference tasks (but which the LLM erroneously deems high-risk without risk adjustment) and skip a further 19.8% of high-risk tasks, which would have been answered incorrectly.

尽管大型语言模型在自然语言处理方面取得了令人瞩目的成就，但它们也存在重要的风险。本文定义和形式化了决策风险和综合风险这两种不同类型的风险，并提出了评估这些风险的风险中心化评估框架和四个新指标。最后，我们提出了一种名为DwD的风险调整校准方法，帮助大型语言模型在整体自然语言推理架构中降低这些风险。实验证明了评估框架的实用性以及DwD在降低决策风险和综合风险方面的功效。

使用风险调整置信度评分提升大型语言模型鲁棒性的形式化与方法