machine learning models fail to perform well on real-world applications when
1) the category distribution P(Y) of the training dataset suffers from
long-tailed distribution and 2) the test data is drawn from different
conditional distributions P(X|Y). Existing approaches cannot handle