一个形式化的框架来表征可解释性程序

Jul, 2017

一个形式化的框架来表征可解释性程序

A Formal Framework to Characterize Interpretability of Procedures

Amit Dhurandhar, Vijay Iyengar, Ronny Luss, Karthikeyan Shanmugam

TL;DR我们提供了一个新颖的解释概念，其定义与目标模型有关，通过将其与准确性和鲁棒性等实际因素相联系，可以比较可解释程序的优劣，描绘了当前许多最先进的可解释方法的适用性。

Abstract

We provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a →