When large language models are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. To this end, we design a controlled setup, focused on closed-book QA, where we vary the proportion of the fine-tuning examples that introduce new knowledge. We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning, as fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, we also find that as the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate. Taken together, our results highlight the risk in introducing new factual knowledge through fine-tuning, and support the view that large language models mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more efficiently.

大型语言模型在通过有监督微调对齐时，会遇到并未通过预训练获得的新的事实信息，从而可能教会模型产生虚假的事实错误响应，导致模型训练生成不基于其现有知识的事实。本研究旨在研究这种新知识暴露对经过微调的模型利用其现有知识的影响。我们设计了一个可控制的实验，并集中于封闭式问答，通过在微调示例中引入新知识的比例来变化。我们证明大型语言模型在微调过程中较难获得新的事实知识，因为引入新知识的微调示例要比与模型现有知识相一致的示例学习速度慢得多。然而，我们还发现随着新知识的学习，它们线性增加了模型产生虚假响应的倾向。综上所述，我们的结果强调通过微调引入新的事实知识的风险，并支持大型语言模型主要通过预训练获取事实知识，而微调则教会它们更有效地使用。

对新知识进行细调的LLMs是否鼓励产生幻觉?