In many machine learning applications, it is important for the user to understand the reasoning behind the recommendation or prediction of the classifiers. The learned models, however, are often too complicated to be understood by a human. Research from the social sciences indicates that humans prefer counterfactual explanations over alternatives. In this paper, we present a general framework for generating counterfactual explanations in the textual domain. Our framework is model-agnostic, representation-agnostic, domain-agnostic, and anytime. We model the task as a search problem in a space where the initial state is the classified text, and the goal state is a text in the complementary class. The operators transform a text by replacing parts of it. Our framework includes domain-independent operators, but can also exploit domain-specific knowledge through specialized operators. The search algorithm attempts to find a text from the complementary class with minimal word-level Levenshtein distance from the original classified object.

在机器学习中，理解分类器推荐或预测背后的推理对用户非常重要。然而，学习的模型通常太复杂，以至于人类难以理解。本文提出了一个通用框架，用于在文本领域生成反事实解释，这个框架是不限于模型、表示和领域的，并且可以随时使用。本文将任务建模为通过替换文本的各个部分来转换初始状态为分类文本的搜索问题，包括独立于领域的运算符，但也可以通过专门的运算符利用领域特定的知识。搜寻算法试图找到距离原始分类对象具有最小的基于单词级别的Levenshtein距离的互补类文本。

文本分类的任意生成对抗解释