Jul, 2024

统一嵌入对齐开放式词汇视频实例分割

TL;DROpen-Vocabulary Video Instance Segmentation (VIS) is addressed by proposing OVFormer, a novel baseline that tackles domain gap and underutilization of temporal consistency, achieving state-of-the-art performance in LV-VIS and demonstrating strong zero-shot generalization ability.