BriefGPT.xyz
Jul, 2024
利用令牌替换进行语法文本后门攻击的防御
Defense Against Syntactic Textual Backdoor Attacks with Token Substitution
HTML
PDF
Xinglin Li, Xianwen He, Yao Li, Minhao Cheng
TL;DR
文本后门攻击对大型语言模型(LLM)构成重大安全风险。该论文提出了一种新颖的在线防御算法,可以有效对抗基于语法和特殊令牌的后门攻击,提供了一种全面的模型完整性防御策略。
Abstract
textual backdoor attacks
present a substantial security risk to
large language models
(LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict
→