Interpreting the inner function of neural networks is crucial for the
trustworthy development and deployment of these black-box models. Prior
interpretability methods focus on correlation-based measures to attribute model
decisions to individual examples. However, these measures are su