BriefGPT.xyz
Mar, 2025
计算机代理中的上下文防御:一项实证研究
In-Context Defense in Computer Agents: An Empirical Study
HTML
PDF
Pei Yang, Hai Ci, Mike Zheng Shou
TL;DR
本研究解决了计算机代理在视觉语言模型支持下易受上下文欺骗攻击的问题,提出了一种新的“上下文防御”方法。通过引入少量精心策划的例子来指导代理进行防御推理,该方法显著提高了代理抵抗欺骗攻击的能力,实验结果显示攻成功率降低了91.2%。
Abstract
computer agents
powered by
vision-language models
(VLMs) have significantly advanced human-computer interaction, enabling users to perform complex tasks through natural language instructions. However, these agent
→