We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images and demonstrate that the obtained residuals act as a digital fingerprint for adversarial attacks. To emulate the limitations of a physical adversary, we evaluate the performance of our approach against localized (patch-based) adversarial attacks, including in settings where the adversary has complete knowledge about the detection scheme. Our results show that the proposed detection method generalizes to previously unseen, stronger attacks and that it is able to reduce the success rate (conversely, increase the computational effort) of an adaptive attacker.

本文介绍了一种基于图像残差的对抗样本检测算法，特别设计用于防范基于补丁的攻击。使用小波域算法对图像进行去噪并用判别器区分干净和对抗样本的差值被作为图像残差，我们证明了被获取的残差可以作为对抗攻击的数字指纹。该检测方法对之前未见的攻击方法具有一定的普适性，可以减小自适应攻击者的攻击成功率，但需要更大的计算量。

利用图像残差检测补丁对抗攻击