TL;DR通过引入 PEL 方法,该研究通过少于 20 个时期的微调,无需额外数据即可适应长尾识别任务,并通过在分类器初始化中采用 CLIP 文本编码器的新颖技术解决了过度拟合问题,从而持续优于之前的最佳方法。
Abstract
The "pre-training and fine-tuning" paradigm in addressing long-tailed
recognition tasks has sparked significant interest since the emergence of large
vision-language models like the contrastive language-image pre-training (CLIP).
While previous studies have shown promise in adapting pr