The application of deep neural network models in various security-critical
applications has raised significant security concerns, particularly the risk of
backdoor attacks. Neural backdoors pose a serious security threat as they allow
attackers to maliciously alter model behavior. While many defenses have been
explored, existing approaches are often bounded by model-specific constraints,
or necessitate complex alterations to the training process, or fall short
against diverse backdoor attacks. In this work, we introduce a novel method for
comprehensive and effective elimination of backdoors, called ULRL (short for
UnLearn and ReLearn for backdoor removal). ULRL requires only a small set of
clean samples and works effectively against all kinds of backdoors. It first
applies unlearning for identifying suspicious neurons and then targeted neural
weight tuning for backdoor mitigation (i.e., by promoting significant weight
deviation on the suspicious neurons). Evaluated against 12 different types of
backdoors, ULRL is shown to significantly outperform state-of-the-art methods
in eliminating backdoors whilst preserving the model utility.

ULRL 是一种全面有效的去除后门的新方法，它通过首先使用 unlearning 来识别可疑神经元，然后通过有针对性的神经权重调整来减轻后门攻击，ULRL 在消除后门同时保留模型的实用性方面显著优于现有方法。

仅使用少量干净样本的统一神经背门去除方法：遗忘与重新学习

Unified Neural Backdoor Removal with Only Few Clean Samples through  Unlearning and Relearning

Neural networks have had discernible achievements in a wide range of
applications. The wide-spread adoption also raises the concern of their
dependability and reliability. Similar to traditional decision-making programs,
neural networks can have defects that need to be repaired. The defects may
cause unsafe behaviors, raise security concerns or unjust societal impacts. In
this work, we address the problem of repairing a neural network for desirable
properties such as fairness and the absence of backdoor. The goal is to
construct a neural network that satisfies the property by (minimally) adjusting
the given neural network's parameters (i.e., weights). Specifically, we propose
CARE (\textbf{CA}usality-based \textbf{RE}pair), a causality-based neural
network repair technique that 1) performs causality-based fault localization to
identify the `guilty' neurons and 2) optimizes the parameters of the identified
neurons to reduce the misbehavior. We have empirically evaluated CARE on
various tasks such as backdoor removal, neural network repair for fairness and
safety properties. Our experiment results show that CARE is able to repair all
neural networks efficiently and effectively. For fairness repair tasks, CARE
successfully improves fairness by $61.91\%$ on average. For backdoor removal
tasks, CARE reduces the attack success rate from over $98\%$ to less than
$1\%$. For safety property repair tasks, CARE reduces the property violation
rate to less than $1\%$. Results also show that thanks to the causality-based
fault localization, CARE's repair focuses on the misbehavior and preserves the
accuracy of the neural networks.

提出了一种基于因果推理的神经网络修复技术 CARE，该技术可以有效地修复神经网络，使其满足公平性、安全性等性质，平均提高了 61.91% 的公平性，并将攻击成功率从 98% 以上降至 1% 以下。

因果关系基础下神经网络修复

Causality-based Neural Network Repair

We propose a minimax formulation for removing backdoors from a given poisoned
model based on a small set of clean data. This formulation encompasses much of
prior work on backdoor removal. We propose the Implicit Bacdoor Adversarial
Unlearning (I-BAU) algorithm to solve the minimax. Unlike previous work, which
breaks down the minimax into separate inner and outer problems, our algorithm
utilizes the implicit hypergradient to account for the interdependence between
inner and outer optimization. We theoretically analyze its convergence and the
generalizability of the robustness gained by solving minimax on clean data to
unseen test data. In our evaluation, we compare I-BAU with six state-of-art
backdoor defenses on seven backdoor attacks over two datasets and various
attack settings, including the common setting where the attacker targets one
class as well as important but underexplored settings where multiple classes
are targeted. I-BAU's performance is comparable to and most often significantly
better than the best baseline. Particularly, its performance is more robust to
the variation on triggers, attack settings, poison ratio, and clean data size.
Moreover, I-BAU requires less computation to take effect; particularly, it is
more than $13\times$ faster than the most efficient baseline in the
single-target attack setting. Furthermore, it can remain effective in the
extreme case where the defender can only access 100 clean samples -- a setting
where all the baselines fail to produce acceptable results.

本文提出了使用小型的干净数据集来消除给定毒瘤模型中的后门的极小极大化公式，并提出了内隐后门对抗遗忘（I-BAU）算法来解决该问题。I-BAU 算法的性能相当且通常比最佳基线优越，尤其是对于触发器的变化，攻击设置，毒瘤比率和干净数据大小的情况下更加鲁棒。