BriefGPT.xyz
Jun, 2024
药剂:走向毒物遗忘
Potion: Towards Poison Unlearning
HTML
PDF
Stefan Schoepf, Jack Foster, Alexandra Brintrup
TL;DR
通过引入新的抗干扰方法和寻找适合的超参数来解决恶意攻击和毒数据从已训练模型中删除的问题,我们的方法在CIFAR10和CIFAR100数据集上检验后,毒数据去除效果显著,修复了93.72%的毒样本,相较于全模型重新训练方法(40.68%)和Selective Synaptic Dampening方法(83.41%),我们的方法降低了模型准确率丧失的程度。
Abstract
adversarial attacks
by malicious actors on
machine learning systems
, such as introducing poison triggers into training datasets, pose significant risks. The challenge in resolving such an attack arises in practic
→