BriefGPT.xyz
Jun, 2024
对大规模语言模型的对抗欺骗攻击效率
Adversarial Evasion Attack Efficiency against Large Language Models
HTML
PDF
João Vitorino, Eva Maia, Isabel Praça
TL;DR
使用五种不同的大型语言模型(LLMs)进行情感分类任务时,针对三种不同类型的对抗攻击,该研究分析了攻击的有效性、效率和实用性,发现词级攻击更有效,而字符级攻击则更实用且所需的改动和查询数量较少,因此在开发对抗性防御策略以训练更具鲁棒性的LLMs用于智能文本分类应用时需考虑这些差异。
Abstract
large language models
(LLMs) are valuable for text classification, but their
vulnerabilities
must not be disregarded. They lack robustness against
→