We evaluate the ability of Large Language Models (LLMs) to discern and
express their internal knowledge state, a key factor in countering factual
hallucination and ensuring reliable application of LLMs. We observe a robust
self-awareness of internal knowledge state in LLMs, evidenced by over 85%
accuracy in knowledge probing. However, LLMs often fail to express their
internal knowledge during generation, leading to factual hallucinations. We
develop an automated hallucination annotation tool, Dreamcatcher, which merges
knowledge probing and consistency checking methods to rank factual preference
data. Using knowledge preference as reward, We propose a Reinforcement Learning
from Knowledge Feedback (RLKF) training framework, leveraging reinforcement
learning to enhance the factuality and honesty of LLMs. Our experiments across
multiple models show that RLKF training effectively enhances the ability of
models to utilize their internal knowledge state, boosting performance in a
variety of knowledge-based and honesty-related tasks.

通过使用知识探测、一致性检查和强化学习等方法，我们发现大型语言模型在辨别和表达其内部知识状态方面具有强大的自我意识，然而它们在生成过程中常常无法表达其内部知识，导致虚构。为此，我们提出了一种自动虚构注释工具，通过梦网，该工具将知识探测和一致性检查方法结合起来，以排名虚构偏好数据。通过使用知识偏好作为奖励，我们提出了一种从知识反馈中强化学习（RLKF）的训练框架，利用强化学习增强大型语言模型的真实性和诚实性。我们对多个模型进行的实验证明，RLKF 训练有效地增强了模型利用其内部知识状态的能力，在各种基于知识和诚实性的任务中提高了性能。