Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets that evaluate the robustness of these models to minimal logical edits in rulebases and some standard logical equivalence conditions. In our experiments with RoBERTa and T5, we find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR, thus showing that the models are not robust to the proposed logical perturbations. Further, we find that the models find it especially hard to learn logical negation and disjunction operators. Overall, using our evaluation sets, we demonstrate some shortcomings of the deductive reasoning-based language models, which can eventually help towards designing better models for logical reasoning over natural language.

本文通过提出的RobustLR数据集，对当前在英语自然语言逻辑规则库上执行演绎推理的Transformer模型的鲁棒性进行了评估，结果展示这些模型在面临最小改动的逻辑规则库的情况下表现不一致，难以学习逻辑否定和逻辑联合运算符，从而揭示出这些基于推导式推理的自然语言处理模型的一些缺陷。

RobustLR: 评估演绎推理中对逻辑扰动的稳健性