Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard
TL;DR该研究提出一种名称为 FAR 的新型范式,用于通过在输入的局部领域内最小化属性映射的最大差异来训练模型的鲁棒属性。通过新模型 AAT 和 AdvAAT 的实验表明,所提出的方法在对抗干扰下都更有稳健性。
Abstract
attribution maps are popular tools for explaining neural networks
predictions. By assigning an importance value to each input dimension that
represents its impact towards the outcome, they give an intuitive expla