Jul, 2024
统一嵌入对齐开放式词汇视频实例分割
Unified Embedding Alignment for Open-Vocabulary Video Instance
Segmentation
TL;DROpen-Vocabulary Video Instance Segmentation (VIS) is addressed by proposing OVFormer, a novel baseline that tackles domain gap and underutilization of temporal consistency, achieving state-of-the-art performance in LV-VIS and demonstrating strong zero-shot generalization ability.