Large language models (LLMs) have been found to produce hallucinations when the question exceeds their internal knowledge boundaries. A reliable model should have a clear perception of its knowledge boundaries, providing correct answers within its scope and refusing to answer when it lacks knowledge. Existing research on LLMs' perception of their knowledge boundaries typically uses either the probability of the generated tokens or the verbalized confidence as the model's confidence in its response. However, these studies overlook the differences and connections between the two. In this paper, we conduct a comprehensive analysis and comparison of LLMs' probabilistic perception and verbalized perception of their factual knowledge boundaries. First, we investigate the pros and cons of these two perceptions. Then, we study how they change under questions of varying frequencies. Finally, we measure the correlation between LLMs' probabilistic confidence and verbalized confidence. Experimental results show that 1) LLMs' probabilistic perception is generally more accurate than verbalized perception but requires an in-domain validation set to adjust the confidence threshold. 2) Both perceptions perform better on less frequent questions. 3) It is challenging for LLMs to accurately express their internal confidence in natural language.

本研究解决了大型语言模型（LLMs）在知识边界感知方面的不足，重点分析了模型在生成的概率与口头信心之间的差异和联系。通过比较，发现概率感知通常比口头感知更准确，但需要领域内的验证集以调整信心阈值，且二者在处理不常见问题时表现更佳。这一发现有助于提升模型在回答超出其知识范围问题时的可靠性。

大型语言模型在其概率或口头信心中的诚实性比较