TL;DR研究使用通过对抗训练引入的置信度信息来增强给定对抗性训练模型的对抗鲁棒性及提出基于置信度信息和最近邻搜索的 Highly Confident Near Neighbor(HCNN)框架,以加强基本模型的对抗鲁棒性,并进行详细的实证研究。
Abstract
In the adversarial perturbation problem of neural networks, an adversary starts with a neural network model $F$ and a point $\bf x$ that $F$ classifies correctly, and identifies another point ${\bf x}'$, which is nearby $\bf x$, that $F$ classifies incorrectly. In this paper we consider a defense method that is based on the semantics of $F$. Our starting poi