用反事实表示解释文本分类器

Feb, 2024

Explaining Text Classifiers with Counterfactual Representations

Pirmin Lemberger, Antoine Saillenfest

TL;DR通过在文本表示空间进行干预的简单方法生成对抗事实，以用于分类器解释和偏见缓解。

Abstract

One well motivated explanation method for classifiers leverages counterfactuals which are hypothetical events identical to real observations in all aspects except for one categorical feature. Constructing such co