Feature attribution methods are popular for explaining neural network predictions, and they are often evaluated on metrics such as comprehensiveness and sufficiency, which are motivated by the principle that more important features -- as judged by the explanation -- should have larger impacts on model prediction. In this paper, we highlight an intriguing property of these metrics: their solvability. Concretely, we can define the problem of optimizing an explanation for a metric and solve it using beam search. This brings up the obvious question: given such solvability, why do we still develop other explainers and then evaluate them on the metric? We present a series of investigations showing that this beam search explainer is generally comparable or favorable to current choices such as LIME and SHAP, suggest rethinking the goals of model interpretability, and identify several directions towards better evaluations of new method proposals.

本文介绍了一个解释神经网络预测的特征归因方法，提出了一个问题：为什么我们不使用解释器（例如LIME），而是基于解决度量来优化解释，如果度量值代表了解释质量呢？我们实现了解释器，并发布了Python solvex包，可用于文本、图像和表格等领域的模型。

可解释性评估指标的可求解性