Many machine learning algorithms are trained and evaluated by splitting data
from a single source into training and test sets. While such focus on
in-distribution learning scenarios has led to interesting advancement, it has
not been able to tell if models are relying on dataset biases