Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at <https://github.com/changzhang777/ANCRA>.

深度神经网络容易受到对抗性噪声的攻击。为了解决这个问题，我们提出了一种通用的对抗训练框架来获得稳健的特征表达，通过非对称负对比度和反向注意力来推动不同类别的特征在特征空间中远离，并通过线性分类器参数对特征进行加权以获得类别感知的特征并将相同类别的特征相互靠近。经过在三个基准数据集上的实证评估，我们的方法大大提高了对抗训练的鲁棒性并实现了最先进的性能。

通过非对称负对比和反向注意力实现鲁棒表征学习