通过设计战胜提示注入攻击

Mar, 2025

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini...

TL;DR本研究解决了大型语言模型在处理不受信数据时易受提示注入攻击的问题。提出的CaMeL防御机制通过在模型周围创建保护层，确保即使底层模型易受攻击也能安全运行。重要的发现是，CaMeL在AgentDojo平台上解决了67%的任务，展现了其有效性和安全性。

Abstract

Large Language Models (LLMs) are increasingly deployed in Agentic Systems that interact with an external environment. However, LLM agents are vulnerable to →