BriefGPT.xyz
Mar, 2025
通过设计战胜提示注入攻击
Defeating Prompt Injections by Design
HTML
PDF
Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini...
TL;DR
本研究解决了大型语言模型在处理不受信数据时易受提示注入攻击的问题。提出的CaMeL防御机制通过在模型周围创建保护层,确保即使底层模型易受攻击也能安全运行。重要的发现是,CaMeL在AgentDojo平台上解决了67%的任务,展现了其有效性和安全性。
Abstract
Large Language Models
(LLMs) are increasingly deployed in
Agentic Systems
that interact with an external environment. However, LLM agents are vulnerable to
→