Large language models (LLMs) have been treated as knowledge bases due to
their strong performance in knowledge probing tasks. LLMs are typically
evaluated using accuracy, yet this metric does not capture the vulnerability of
LLMs to hallucination-inducing factors like prompt and context variability. How
do we evaluate the capabilities of LLMs to consistently produce factually
correct answers? In this paper, we propose MOdel kNowledge relIabiliTy scORe
(MONITOR), a novel metric designed to directly measure LLMs' factual
reliability. MONITOR computes the distance between the probability
distributions of a valid output and its counterparts produced by the same LLM
probing the same fact using different styles of prompts and
contexts.Experiments on a comprehensive range of 12 LLMs demonstrate the
effectiveness of MONITOR in evaluating the factual reliability of LLMs while
maintaining a low computational overhead. In addition, we release the FKTC
(Factual Knowledge Test Corpus) test set, containing 210,158 prompts in total
to foster research along this line (this https URL).

本文提出了一种名为 MONITOR 的新度量方法，用于直接衡量大型语言模型的事实可靠性，通过计算有效输出与同一模型使用不同类型提示和上下文进行探索所产生的对应输出之间的概率分布距离来评估模型的一致性。实验证明 MONITOR 对于评估大型语言模型的事实可靠性具有良好的效果，并且计算开销较低。此外，作者还发布了包含 210,158 个提示的 FKTC 测试集，以促进相关研究的开展。

评估大型语言模型知识的可靠性

Assessing the Reliability of Large Language Model Knowledge

In safety-critical applications, autonomous agents may need to learn in an
environment where mistakes can be very costly. In such settings, the agent
needs to behave safely not only after but also while learning. To achieve this,
existing safe reinforcement learning methods make an agent rely on priors that
let it avoid dangerous situations during exploration with high probability, but
both the probabilistic guarantees and the smoothness assumptions inherent in
the priors are not viable in many scenarios of interest such as autonomous
driving. This paper presents an alternative approach inspired by human
teaching, where an agent learns under the supervision of an automatic
instructor that saves the agent from violating constraints during learning. In
this model, we introduce the monitor that neither needs to know how to do well
at the task the agent is learning nor needs to know how the environment works.
Instead, it has a library of reset controllers that it activates when the agent
starts behaving dangerously, preventing it from doing damage. Crucially, the
choices of which reset controller to apply in which situation affect the speed
of agent learning. Based on observing agents' progress, the teacher itself
learns a policy for choosing the reset controllers, a curriculum, to optimize
the agent's final policy reward. Our experiments use this framework in two
environments to induce curricula for safe and efficient learning.

本文提出一种受人类教学启发的替代方法，即代理在自动指导监督下学习，其中引入了监视器来防止其在学习过程中违反约束条件。