Attention module does not always help deep models learn causal features that
are robust in any confounding context, e.g., a foreground object feature is
invariant to different backgrounds. This is because the confounders trick the
attention to capture spurious correlations that benefit