Retrieval-Augmented Generative (RAG) models enhance Large Language Models
(LLMs) by integrating external knowledge bases, improving their performance in
applications like fact-checking and information searching. In this paper, we
demonstrate a security threat where adversaries can exploit the openness of
these knowledge bases by injecting deceptive content into the retrieval
database, intentionally changing the model's behavior. This threat is critical
as it mirrors real-world usage scenarios where RAG systems interact with
publicly accessible knowledge bases, such as web scrapings and user-contributed
data pools. To be more realistic, we target a realistic setting where the
adversary has no knowledge of users' queries, knowledge base data, and the LLM
parameters. We demonstrate that it is possible to exploit the model
successfully through crafted content uploads with access to the retriever. Our
findings emphasize an urgent need for security measures in the design and
deployment of RAG systems to prevent potential manipulation and ensure the
integrity of machine-generated content.

该研究论文探讨了检索增强的生成模型（RAG）如何整合外部知识库，提高其在事实核查和信息搜索等应用中的性能；同时指出了敌对方如何通过向检索数据库注入虚假内容来更改模型行为，进而对 RAG 系统进行成功的攻击，为此呼吁在设计和部署 RAG 系统时采取安全措施以确保机器生成内容的完整性。

利用检索增强生成模型的漏洞

"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in  Retrieval-Augmented Generative Models

The risks derived from large language models (LLMs) generating deceptive and
damaging content have been the subject of considerable research, but even safe
generations can lead to problematic downstream impacts. In our study, we shift
the focus to how even safe text coming from LLMs can be easily turned into
potentially dangerous content through Bait-and-Switch attacks. In such attacks,
the user first prompts LLMs with safe questions and then employs a simple
find-and-replace post-hoc technique to manipulate the outputs into harmful
narratives. The alarming efficacy of this approach in generating toxic content
highlights a significant challenge in developing reliable safety guardrails for
LLMs. In particular, we stress that focusing on the safety of the verbatim LLM
outputs is insufficient and that we also need to consider post-hoc
transformations.

通过诱饵和转换攻击，大型语言模型能够将安全文本转化为有害内容，这提醒我们在开发可靠的安全保护机制时需要考虑后续转换。