Interpreting remote sensing imagery enables numerous downstream applications ranging from land-use planning to deforestation monitoring. Robustly classifying this data is challenging due to the Earth's geographic diversity. While many distinct satellite and aerial image classification datasets exist, there is yet to be a benchmark curated that suitably covers this diversity. In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%. We provide a $\href{https://satinbenchmark.github.io}{\text{public leaderboard}}$ to guide and track the progress of VL models in this important domain.

本研究旨在建立卫星遥感图像分类的基准数据集（SATIN），以对广泛范围的视觉语言（VL）模型的零-shot传输分类能力进行全面评估，并发现SATIN是一个具有挑战性的基准数据集。该数据集涵盖了来自27个现有遥感数据集的元数据，并在分类准确率方面取得了52.0%的最高性能。

SATIN:使用视觉语言模型分类卫星图像的多任务元数据集