Adi Shamir, Itay Safran, Eyal Ronen, Orr Dunkelman
TL;DR本文提出了一个简单的数学框架,解释了对抗性样本是如何在深度神经网络中产生的,并解释了为什么我们可以期望在设计用于区分 m 个输入类的任意深度神经网络中找到目标对抗性样本,使得哈明距离大约为 m 。
Abstract
The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing proper