Many defenses against adversarial attacks (\eg robust classifiers, randomization, or image purification) use countermeasures put to work only after the attack has been crafted. We adopt a different perspective to introduce $A^5$ (Adversarial Augmentation Against Adversarial Attacks), a novel framework including the first certified preemptive defense against adversarial attacks. The main idea is to craft a defensive perturbation to guarantee that any attack (up to a given magnitude) towards the input in hand will fail. To this aim, we leverage existing automatic perturbation analysis tools for neural networks. We study the conditions to apply $A^5$ effectively, analyze the importance of the robustness of the to-be-defended classifier, and inspect the appearance of the robustified images. We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label, and demonstrate the benefits of robustifier and classifier co-training. In our tests, $A^5$ consistently beats state of the art certified defenses on MNIST, CIFAR10, FashionMNIST and Tinyimagenet. We also show how to apply $A^5$ to create certifiably robust physical objects. Our code at https://github.com/NVlabs/A5 allows experimenting on a wide range of scenarios beyond the man-in-the-middle attack tested here, including the case of physical attacks.

本研究提出了一种名为 $A^5$ 的新框架，包括针对对抗攻击的首个认证预防性防御方法，该方法的主要思想是利用现有的神经网络自动扰动分析工具来为输入数据创建一种防御性扰动，从而保证在一定幅度攻击下输入数据的防御性失败。 在多项测试中，$A^5$ 打败了当前最先进的 MNIST，CIFAR10，FashionMNIST 和 Tinyimagenet 认证防御技术，并且还展示了如何使用 $A^5$ 创建可证明鲁棒的物理对象。

攻守之道：对抗性攻击增强对抗性增强