We are interested in learning robust models from insufficient data, without
the need for any externally pre-trained checkpoints. First, compared to
sufficient data, we show why insufficient data renders the model more easily
biased to the limited training environments that are usually different from
testing. For example, if all the training swan samples are