BriefGPT.xyz
Dec, 2023
强制生成模型退化:数据注毒攻击的力量
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks
HTML
PDF
Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo
TL;DR
通过细粒度的实验,我们展示了在大语言模型的精调阶段仅仅使用总数据样本的1%即可成功地对大语言模型进行毒化,这是针对自然语言生成任务进行的首次系统性理解并考虑了多种触发方式和攻击设置的毒化攻击。
Abstract
Growing applications of
large language models
(LLMs) trained by a third party raise serious concerns on the
security vulnerability
of LLMs.It has been demonstrated that malicious actors can covertly exploit these
→