This study addresses the hallucination problem in large language models (LLMs). We adopted Retrieval-Augmented Generation(RAG) (Lewis et al., 2020), a technique that involves embedding relevant information in the prompt to obtain accurate answers. However, RAG also faced inherent issues in retrieving correct information. To address this, we employed the Dense Passage Retrieval(DPR) (Karpukhin et al., 2020) model for fetching domain-specific documents related to user queries. Despite this, the DPR model still lacked accuracy in document retrieval. We enhanced the DPR model by incorporating control tokens, achieving significantly superior performance over the standard DPR model, with a 13% improvement in Top-1 accuracy and a 4% improvement in Top-20 accuracy.

本研究解决了大型语言模型(LLMs)中的幻觉问题。我们采用了检索增强生成(Retrieval-Augmented Generation, RAG)技术，通过在提示信息中嵌入相关信息来获得准确答案。然而，RAG在检索正确信息方面也面临固有问题。为了解决这个问题，我们采用了密集路径检索(Dense Passage Retrieval, DPR)模型，用于获取与用户查询相关的领域专业文档。尽管如此，DPR模型在文档检索方面仍然精度不足。我们通过引入控制符号来增强DPR模型，取得了显著优异的性能，Top-1准确率提高了13%，Top-20准确率提高了4%。

控制令牌与密集段落检索