Deep learning models are intrinsically sensitive to distribution shifts in the input data. In particular, small, barely perceivable perturbations to the input data can force models to make wrong predictions with high confidence. An common defense mechanism is regularization through adversarial training which injects worst-case perturbations back into training to strengthen the decision boundaries, and to reduce overfitting. In this context, we perform an investigation of 3x3 convolution filters that form in adversarially-trained models. Filters are extracted from 71 public models of the linf-RobustBench CIFAR-10/100 and ImageNet1k leaderboard and compared to filters extracted from models built on the same architectures but trained without robust regularization. We observe that adversarially-robust models appear to form more diverse, less sparse, and more orthogonal convolution filters than their normal counterparts. The largest differences between robust and normal models are found in the deepest layers, and the very first convolution layer, which consistently and predominantly forms filters that can partially eliminate perturbations, irrespective of the architecture. Data & Project website: https://github.com/paulgavrikov/cvpr22w_RobustnessThroughTheLens

在对具备鲁棒性的深度学习模型进行卷积滤波器分析的研究中，发现经过对抗训练的模型形成了多样性更大、稀疏性更小且更正交的卷积滤波器，且具有更高的分类精度，需要更深层网络结构来实现，此方法也被成功地应用于CIFAR-10数据集中。

卷积滤波器视角下的对抗性鲁棒性