This paper mathematically derives an analytic solution of the adversarial perturbation on a ReLU network, and theoretically explains the difficulty of adversarial training. Specifically, we formulate the dynamics of the adversarial perturbation generated by the multi-step attack, which shows that the adversarial perturbation tends to strengthen eigenvectors corresponding to a few top-ranked eigenvalues of the Hessian matrix of the loss w.r.t. the input. We also prove that adversarial training tends to strengthen the influence of unconfident input samples with large gradient norms in an exponential manner. Besides, we find that adversarial training strengthens the influence of the Hessian matrix of the loss w.r.t. network parameters, which makes the adversarial training more likely to oscillate along directions of a few samples, and boosts the difficulty of adversarial training. Crucially, our proofs provide a unified explanation for previous findings in understanding adversarial training.

本文通过数学推导的方法得到了对ReLU网络中对抗性扰动的解析解，并从理论上解释了对抗性训练的困难。具体来说，我们得到了由多步攻击生成的对抗性扰动的动力学方程，表明对抗性扰动倾向于加强与损失函数的Hessian矩阵中排名前几个特征值相对应的特征向量。我们还证明了对抗性训练倾向于以指数方式加强那些梯度范数较大的不自信输入样本的影响。此外，我们发现对抗性训练加强了相对于网络参数的损失函数Hessian矩阵的影响，使得对抗性训练更易沿少数样本方向振荡，从而加剧了对抗性训练的难度。关键是，我们的证明为对抗性训练理解方面的先前发现提供了统一的解释。

为什么ReLU网络的对抗训练很困难？