To address the problem of nlp classifiers learning spurious correlations
between training features and target labels, a common approach is to make the
model's predictions invariant to these features. However, this can be
counter-productive when the features have a non-zero causal effec