Large Language Models (LLMs) are vulnerable to attacks like prompt injection, backdoor attacks, and adversarial attacks, which manipulate prompts or models to generate harmful outputs. In this paper, departing from traditional deep learning attack paradigms, we explore their intrinsic relationship and collectively term them Prompt Trigger Attacks (PTA). This raises a key question: Can we determine if a prompt is benign or poisoned? To address this, we propose UniGuardian, the first unified defense mechanism designed to detect prompt injection, backdoor attacks, and adversarial attacks in LLMs. Additionally, we introduce a single-forward strategy to optimize the detection pipeline, enabling simultaneous attack detection and text generation within a single forward pass. Our experiments confirm that UniGuardian accurately and efficiently identifies malicious prompts in LLMs.

本研究解决了大型语言模型（LLMs）易受提示注入、后门攻击和对抗攻击等攻击类型的问题，提出了一个统一的防御机制UniGuardian。该机制首次能够同时检测多种攻击，并通过单次前向传播优化检测流程，显著提高了对恶意提示的识别准确性和效率。

UniGuardian：一种统一防御机制用于检测大型语言模型中的提示注入、后门攻击和对抗攻击