BriefGPT.xyz
Jul, 2024
生成模型退化:数据投毒攻击的威力
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks
HTML
PDF
Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai...
TL;DR
通过杂交调优和触发设计,研究论文探讨了对大型语言模型进行毒化攻击的高效性和隐蔽性,发现现有的防御方法并不起作用,并为AI安全社区开发有效的对抗策略提供了理论基础。
Abstract
The increasing use of
large language models
(LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through
poisoning attacks
to generate unde
→