Adversarial training (AT) is currently one of the most successful methods to obtain the adversarial robustness of deep neural networks. However, a significant generalization gap in the robustness obtained from AT has been problematic, making practitioners to consider a bag of tricks for a successful training, e.g., early stopping. In this paper, we investigate data augmentation (DA) techniques to address the issue. In contrast to the previous reports in the literature that DA is not effective for regularizing AT, we discover that DA can mitigate overfitting in AT surprisingly well, but they should be chosen deliberately. To utilize the effect of DA further, we propose a simple yet effective auxiliary 'consistency' regularization loss to optimize, which forces predictive distributions after attacking from two different augmentations to be similar to each other. Our experimental results demonstrate that our simple regularization scheme is applicable for a wide range of AT methods, showing consistent yet significant improvements in the test robust accuracy. More remarkably, we also show that our method could significantly help the model to generalize its robustness against unseen adversaries, e.g., other types or larger perturbations compared to those used during training. Code is available at https://github.com/alinlab/consistency-adversarial.

本文提出了一种通过优化辅助一致性规则损失来避免鲁棒过度拟合的有效正则化技术，在 Adversarial training 过程中使用数据扩增来强制攻击后的预测分布相似。实验结果表明，这种简单的方法可以显著提高各种 AT 方法的测试准确性，并对模型作出更具实际意义的泛化。

对抗鲁棒性的一种一致性正则化方法