BriefGPT.xyz
Oct, 2020
自然语言处理模型中的隐蔽数据毒化攻击
Customizing Triggers with Concealed Data Poisoning
HTML
PDF
Eric Wallace, Tony Z. Zhao, Shi Feng, Sameer Singh
TL;DR
本研究开发了一种新的数据污染攻击方法,能够在训练数据中插入少量样本并控制模型预测结果,其中包含一个特定的强制词,同时提出了三种缓解该攻击的防御策略。
Abstract
adversarial attacks
alter
nlp model
predictions by perturbing test-time inputs. However, it is much less understood whether, and how, predictions can be manipulated with small, concealed changes to the training d
→