Large language models (LLMs) have demonstrated impressive capabilities in storing and recalling factual knowledge, but also in adapting to novel in-context information. Yet, the mechanisms underlying their in-context grounding remain unknown, especially in situations where in-context information contradicts factual knowledge embedded in the parameters. This is critical for retrieval-augmented generation methods, which enrich the context with up-to-date information, hoping that grounding can rectify the outdated parametric knowledge. In this study, we introduce Fakepedia, a counterfactual dataset designed to evaluate grounding abilities when the parametric knowledge clashes with the in-context information. We benchmark various LLMs with Fakepedia and discover that GPT-4-turbo has a strong preference for its parametric knowledge. Mistral-7B, on the contrary, is the model that most robustly chooses the grounded answer. Then, we conduct causal mediation analysis on LLM components when answering Fakepedia queries. We demonstrate that inspection of the computational graph alone can predict LLM grounding with 92.8% accuracy, especially because few MLPs in the Transformer can predict non-grounded behavior. Our results, together with existing findings about factual recall mechanisms, provide a coherent narrative of how grounding and factual recall mechanisms interact within LLMs.

中文摘要：本研究介绍了Fakepedia，一个反事实数据集，用于评估大型语言模型在参数化知识与上下文信息相冲突时的接地能力。我们测试了各种大型语言模型在Fakepedia上的表现，并发现GPT-4-turbo更偏好参数化知识，而Mistral-7B则最稳定地选择了接地答案。此外，我们对大型语言模型进行因果中介分析，结果表明仅凭计算图的检查即可预测92.8%准确度的接地情况，尤其是变压器中的少数MLP可以预测非接地行为。我们的结果与现有关于事实回忆机制的发现相结合，提供了大型语言模型中接地和事实回忆机制的相互作用的连贯叙述。

定位和检测语言模型基础的瑕疵：使用Fakepedia