Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However, most AT methods suffer from robust overfitting, i.e., a significant generalization gap in adversarial robustness between the training and testing curves. In this paper, we first identify a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm. As such label noise is mainly caused by a distribution mismatch and improper label assignments, we are motivated to propose a label refinement approach for AT. Specifically, our Self-Guided Label Refinement first self-refines a more accurate and informative label distribution from over-confident hard labels, and then it calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers. Empirical results demonstrate that our method can simultaneously boost the standard accuracy and robust performance across multiple benchmark datasets, attack types, and architectures. In addition, we also provide a set of analyses from the perspectives of information theory to dive into our method and suggest the importance of soft labels for robust generalization.

鲁棒训练是为了提高深度神经网络对抗攻击的鲁棒性的最有效方法之一，但大多数鲁棒训练方法存在鲁棒过拟合的问题。本文从梯度范数角度首次找到了鲁棒过拟合与噪声标签过渡记忆之间的联系，并提出了一种自我导向的标签优化方法，它能提高标准准确度和鲁棒性，在多个数据集、攻击类型和架构上都得到了验证。此外，从信息理论的角度对我们的方法进行了分析，并指出了软标签对于鲁棒泛化的重要性。

缓和以防御：通过自引导标签完善实现对抗鲁棒性