An Artificial Intelligence (AI) agent is a software entity that autonomously
performs tasks or makes decisions based on pre-defined objectives and data
inputs. AI agents, capable of perceiving user inputs, reasoning and planning
tasks, and executing actions, have seen remarkable advancements in algorithm
development and task performance. However, the security challenges they pose
remain under-explored and unresolved. This survey delves into the emerging
security threats faced by AI agents, categorizing them into four critical
knowledge gaps: unpredictability of multi-step user inputs, complexity in
internal executions, variability of operational environments, and interactions
with untrusted external entities. By systematically reviewing these threats,
this paper highlights both the progress made and the existing limitations in
safeguarding AI agents. The insights provided aim to inspire further research
into addressing the security threats associated with AI agents, thereby
fostering the development of more robust and secure AI agent applications.

通过系统地审查人工智能代理所面临的安全威胁，本文突出了在保护人工智能代理方面所取得的进展和现有的限制，并旨在激发进一步研究以解决与人工智能代理相关的安全威胁，从而促进更加稳固和安全的人工智能代理应用程序的发展。

AI 代理面临威胁：主要安全挑战与未来发展路径调查

AI Agents Under Threat: A Survey of Key Security Challenges and Future  Pathways

In the rapidly evolving domain of artificial intelligence, safeguarding the
intellectual property of Large Language Models (LLMs) is increasingly crucial.
Current watermarking techniques against model extraction attacks, which rely on
signal insertion in model logits or post-processing of generated text, remain
largely heuristic. We propose a novel method for embedding learnable linguistic
watermarks in LLMs, aimed at tracing and preventing model extraction attacks.
Our approach subtly modifies the LLM's output distribution by introducing
controlled noise into token frequency distributions, embedding an statistically
identifiable controllable watermark.We leverage statistical hypothesis testing
and information theory, particularly focusing on Kullback-Leibler Divergence,
to differentiate between original and modified distributions effectively. Our
watermarking method strikes a delicate well balance between robustness and
output quality, maintaining low false positive/negative rates and preserving
the LLM's original performance.

在快速发展的人工智能领域中，保护大型语言模型（LLMs）的知识产权变得越来越关键。我们提出了一种新颖的方法，在 LLMs 中嵌入可学习的语言水印，以追踪和防止模型提取攻击。我们的方法通过向令牌频率分布中引入可控噪声来微妙地修改 LLM 的输出分布，嵌入可统计辨识的可控水印。我们利用统计假设检验和信息理论，特别关注库尔巴克 - 莱布勒散度，有效区分原始分布和修改分布。我们的水印方法在鲁棒性和输出质量之间达到了微妙的平衡，保持了较低的误报率和漏报率，并且保留了 LLM 的原始性能。