In this work, we instantiate a novel perturbation-based multi-class
explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation).
We demonstrate that LIPEx not only locally replicates the probability
distributions output by the widely used complex classification models but also
provides insight into how every feature deemed to be important affects the
prediction probability for each of the possible classes. We achieve this by
defining the explanation as a matrix obtained via regression with respect to
the Hellinger distance in the space of probability distributions. Ablation
tests on text and image data, show that LIPEx-guided removal of important
features from the data causes more change in predictions for the underlying
model than similar tests on other saliency-based or feature importance-based
XAI methods. It is also shown that compared to LIME, LIPEx is much more data
efficient in terms of the number of perturbations needed for reliable
evaluation of the explanation.

本研究介绍了一种新型的基于扰动的多类别解释框架 LIPEx（局部可解释的概率解释），证明 LIPEx 不仅可以局部复制广泛使用的复杂分类模型输出的概率分布，而且还可以提供关于每个被认为重要的特征如何影响每个可能类别的预测概率的见解。研究通过根据概率分布空间中的 Hellinger 距离执行的回归获得解释的矩阵。对文本和图像数据进行的消融测试表明，与其他基于显著性或特征重要性的 XAI 方法相比，LIPEx 指导下从数据中移除重要特征会对底层模型的预测产生更大的变化。研究还表明，与 LIME 相比，LIPEx 在可靠评估解释所需的扰动数量上具有更高的数据效率。

LIPEx -- 局部可解释性概率解释 -- 超越真实类别

LIPEx -- Locally Interpretable Probabilistic Explanations -- To Look  Beyond The True Class

The emerging field of Explainable Artificial Intelligence focuses on
researching methods of explaining the decision making processes of complex
machine learning models. In the field of explainability for Computer Vision,
explanations are provided as saliency maps, which visualize the importance of
individual pixels of the input w.r.t. the model's prediction. In this work we
focus on a perturbation-based, model-agnostic explainability method called
RISE, elaborate on observed shortcomings of its grid-based approach and propose
two modifications: replacement of square occlusions with convex polygonal
occlusions based on cells of a Voronoi mesh and addition of an informativeness
guarantee to the occlusion mask generator. These modifications, collectively
called VRISE (Voronoi-RISE), are meant to, respectively, improve the accuracy
of maps generated using large occlusions and accelerate convergence of saliency
maps in cases where sampling density is either very low or very high. We
perform a quantitative comparison of accuracy of saliency maps produced by
VRISE and RISE on the validation split of ILSVRC2012, using a saliency-guided
content insertion/deletion metric and a localization metric based on bounding
boxes. Additionally, we explore the space of configurable occlusion pattern
parameters to better understand their influence on saliency maps produced by
RISE and VRISE. We also describe and demonstrate two effects observed over the
course of experimentation, arising from the random sampling approach of RISE:
"feature slicing" and "saliency misattribution". Our results show that convex
polygonal occlusions yield more accurate maps for coarse occlusion meshes and
multi-object images, but improvement is not guaranteed in other cases. The
informativeness guarantee is shown to increase the convergence rate without
incurring a significant computational overhead.

本研究关注机器学习模型的解释方法，介绍了一种基于 RISE 的改进方法 VRISE，包括使用凸多边形覆盖替代了方形遮挡，并加入信息保证生成器来提高解释准确性和加速收敛速度。实验证明，使用 VRISE 产生的沙漏图更加精确并且不需要过多的计算开销。

使用与模型无关方法生成详细显著性地图

Generating detailed saliency maps using model-agnostic methods

Adversarial examples are malicious inputs crafted to cause a model to
misclassify them. Their most common instantiation, "perturbation-based"
adversarial examples introduce changes to the input that leave its true label
unchanged, yet result in a different model prediction. Conversely,
"invariance-based" adversarial examples insert changes to the input that leave
the model's prediction unaffected despite the underlying input's label having
changed.
In this paper, we demonstrate that robustness to perturbation-based
adversarial examples is not only insufficient for general robustness, but
worse, it can also increase vulnerability of the model to invariance-based
adversarial examples. In addition to analytical constructions, we empirically
study vision classifiers with state-of-the-art robustness to perturbation-based
adversaries constrained by an $\ell_p$ norm. We mount attacks that exploit
excessive model invariance in directions relevant to the task, which are able
to find adversarial examples within the $\ell_p$ ball. In fact, we find that
classifiers trained to be $\ell_p$-norm robust are more vulnerable to
invariance-based adversarial examples than their undefended counterparts.
Excessive invariance is not limited to models trained to be robust to
perturbation-based $\ell_p$-norm adversaries. In fact, we argue that the term
adversarial example is used to capture a series of model limitations, some of
which may not have been discovered yet. Accordingly, we call for a set of
precise definitions that taxonomize and address each of these shortcomings in
learning.

本文演示了对扰动型对抗样本的稳健性不仅不足以实现普遍的稳健性，而且它还会增加模型对于不变性型对抗样本的脆弱性，并呼吁一组精确的定义来对学习中的这些限制进行分类和解决。