visual geo-localization (VG) refers to the process to identify the location
described in query images, which is widely applied in robotics field and
computer vision tasks, such as autonomous driving, metaverse, augmented
reality, and SLAM. In fine-grained images lacking specific text d