Real-world visual data often exhibits a long-tailed distribution, where some ''head'' classes have a large number of samples, yet only a few samples are available for the ''tail'' classes. Such imbalanced distribution causes a great challenge for learning a deep neural network, which can be boiled down into a dilemma: on the one hand, we prefer to increase the exposure of the tail class samples to avoid the excessive dominance of head classes in the classifier training. On the other hand, oversampling tail classes makes the network prone to over-fitting, since the head class samples are often consequently under-represented. To resolve this dilemma, in this paper, we propose an embarrassingly simple-yet-effective approach. The key idea is to split a network into a classifier part and a feature extractor part, and then employ different training strategies for each part. Specifically, to promote the awareness of tail-classes, a class-balanced sampling scheme is utilised for training both the classifier and the feature extractor. For the feature extractor, we also introduce an auxiliary training task, which is to train a classifier under the regular random sampling scheme. In this way, the feature extractor is jointly trained from both sampling strategies and thus can take advantage of all training data and avoid the over-fitting issue. Apart from this basic auxiliary task, we further explore the benefit of using self-supervised learning as the auxiliary task. Without using any bells and whistles, our model achieves superior performance over the state-of-the-art solutions.

本文提出了一个简单而有效的辅助学习方法，通过对神经网络进行分类器和特征提取器的拆分，并针对每个部分采用不同的训练策略，如采用类平衡采样方案来提高对尾部类别的重视，并通过自监督学习进一步提高性能，从而解决了类别不平衡问题。

平衡还是不平衡: 一种简单而有效的长尾分布学习方法