Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with potentially untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and, furthermore, result in the RAG system not answering the query - ostensibly because it lacks the information or because the answer is unsafe. We describe and analyze several methods for generating blocker documents, including a new method based on black-box optimization that does not require the adversary to know the embedding or LLM used by the target RAG system, nor access to an auxiliary LLM to generate blocker documents. We measure the efficacy of the considered methods against several LLMs and embeddings, and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents.

使用检索增强生成（RAG）系统时，操作可能存在不受信任内容的数据库的RAG系统容易受到一种称为“堵塞”的新型拒绝服务攻击的影响。我们描述和分析了生成堵塞文档的几种方法，包括一种基于黑盒优化的新方法，不需要攻击者了解目标RAG系统使用的嵌入或LLM，也不需要访问辅助LLM来生成堵塞文档。我们对几种LLM和嵌入的方法的有效性进行了测量，并证明了现有的LLM安全度量指标不能捕捉它们受到堵塞攻击的弱点，并讨论了对抗堵塞文档的防御方法。

机器对抗RAG：用阻塞文档干扰检索增强生成