Instruction-following retrievers have been widely adopted alongside LLMs in real-world applications, but little work has investigated the safety risks surrounding their increasing search capabilities. We empirically study the ability of retrievers to satisfy malicious queries, both when used directly and when used in a retrieval augmented generation-based setup. Concretely, we investigate six leading retrievers, including NV-Embed and LLM2Vec, and find that given malicious requests, most retrievers can (for >50% of queries) select relevant harmful passages. For example, LLM2Vec correctly selects passages for 61.35% of our malicious queries. We further uncover an emerging risk with instruction-following retrievers, where highly relevant harmful information can be surfaced by exploiting their instruction-following capabilities. Finally, we show that even safety-aligned LLMs, such as Llama3, can satisfy malicious requests when provided with harmful retrieved passages in-context. In summary, our findings underscore the malicious misuse risks associated with increasing retriever capability.

本研究探讨了遵循指令的检索器在满足恶意查询方面的安全风险，填补了这一领域的研究空白。我们通过实证研究发现，六个主要检索器在处理恶意请求时，能够选择相关的有害信息，且这类风险与检索器的指令遵循能力密切相关。这表明，随着检索能力的提升，恶意滥用的风险也在增加。

利用遵循指令的检索器进行恶意信息检索