Deep neural networks are vulnerable to backdoor attacks. Among the existing backdoor defense methods, trigger reverse engineering based approaches, which reconstruct the backdoor triggers via optimizations, are the most versatile and effective ones compared to other types of methods. In this paper, we summarize and construct a generic paradigm for the typical trigger reverse engineering process. Based on this paradigm, we propose a new perspective to defeat trigger reverse engineering by manipulating the classification confidence of backdoor samples. To determine the specific modifications of classification confidence, we propose a compensatory model to compute the lower bound of the modification. With proper modifications, the backdoor attack can easily bypass the trigger reverse engineering based methods. To achieve this objective, we propose a Label Smoothing Poisoning (LSP) framework, which leverages label smoothing to specifically manipulate the classification confidences of backdoor samples. Extensive experiments demonstrate that the proposed work can defeat the state-of-the-art trigger reverse engineering based methods, and possess good compatibility with a variety of existing backdoor attacks.

深度神经网络容易受到后门攻击，本文提出了一种基于触发器逆向工程的防御方法，通过操纵后门样本的分类置信度来抵御触发器逆向工程，引入标签平滑引入（LSP）框架来实现分类置信度的特定操纵，实验证明该方法可以击败当前的触发器逆向工程方法，并与各种后门攻击具有良好的兼容性。

LSP框架：基于标签平滑攻击的触发器逆向工程的补偿模型