Real-world data often exhibit extreme imbalances and out-of-distribution
(OOD) instances, which significantly biases the model training. While it has
been extensively studied in vision and language domains separately, the impact
of long-tailed open worlds on multi-modal large language