Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how eight different design choices of popular vision models affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.

通过提出一种新的评估协议，我们评估了23种归因方法以及8种不同的视觉模型设计选择如何影响归因质量，发现内在可解释性模型优于标准模型，并且原始的归因值展现出更高的质量。此外，在改变网络设计时，归因质量也出现了一致性变化，表明一些标准设计选择促进了归因质量。

视觉模型归因质量的基准评估