Backdoor attacks on deep learning represent a recent threat that has gained
significant attention in the research community. Backdoor defenses are mainly
based on backdoor inversion, which has been shown to be generic,
model-agnostic, and applicable to practical threat scenarios. State-of-the-art
backdoor inversion recovers a mask in the feature space to locate prominent
backdoor features, where benign and backdoor features can be disentangled.
However, it suffers from high computational overhead, and we also find that it
overly relies on prominent backdoor features that are highly distinguishable
from benign features. To tackle these shortcomings, this paper improves
backdoor feature inversion for backdoor detection by incorporating extra neuron
activation information. In particular, we adversarially increase the loss of
backdoored models with respect to weights to activate the backdoor effect,
based on which we can easily differentiate backdoored and clean models.
Experimental results demonstrate our defense, BAN, is 1.37$\times$ (on
CIFAR-10) and 5.11$\times$ (on ImageNet200) more efficient with 9.99% higher
detect success rate than the state-of-the-art defense BTI-DBF. Our code and
trained models are publicly
available.https://anonymous.4open.science/r/ban-4B32

改进了针对后门特征的逆向，将额外的神经元激活信息融入后门检测中，通过对带有后门效果的模型的损失权重进行对抗性增加来激活后门效果，从而轻松区分带有后门的模型和干净的模型。与现有的 BTI-DBF 防御相比，实验结果表明我们的 BAN 防御在 CIFAR-10 上效率提高了 1.37 倍，在 ImageNet200 上提高了 5.11 倍，并且具有 9.99% 更高的检测成功率。我们的代码和训练模型已经公开，可供使用。