Empirical studies suggest that machine learning models often rely on
features, such as the background, that may be spuriously correlated with the
label only during training time, resulting in poor accuracy during test-time.
In this work, we identify the fundamental factors that give ri