针对热力图解释的对抗攻击的简单防御

Jul, 2020

针对热力图解释的对抗攻击的简单防御

A simple defense against adversarial attacks on heatmap explanations

Laura Rieger, Lars Kai Hansen

TL;DR通过多种解释方法的聚合，我们提供了一种有效的方法来防御神经网络上的对抗性攻击，使其对于潜在攻击变得更加稳健。

Abstract

With machine learning models being used for more sensitive applications, we rely on interpretability methods to prove that no discriminating attributes were used for classification. A potential concern is the so-