Video instance segmentation requires detecting, segmenting, and tracking objects in videos, typically relying on costly video annotations. This paper introduces a method that eliminates video annotations by utilizing image datasets. The PM-VIS algorithm is adapted to handle both bounding box and instance-level pixel annotations dynamically. We introduce ImageNet-bbox to supplement missing categories in video datasets and propose the PM-VIS+ algorithm to adjust supervision based on annotation types. To enhance accuracy, we use pseudo masks and semi-supervised optimization techniques on unannotated video data. This method achieves high video instance segmentation performance without manual video annotations, offering a cost-effective solution and new perspectives for video instance segmentation applications. The code will be available in https://github.com/ldknight/PM-VIS-plus

通过利用图像数据集，本研究介绍了一种消除视频注释的方法，并通过适应性的PM-VIS算法来处理边框和实例级像素注释。通过引入ImageNet-bbox来补充视频数据集中缺失的类别，并通过PM-VIS+算法根据注释类型调整监督。通过在未注释的视频数据上使用伪掩码和半监督优化技术来提高准确性。这种方法在没有手动视频注释的情况下实现了高水平的视频实例分割性能，为视频实例分割应用提供了具有成本效益的解决方案和新的视角。代码将在此https网址上提供。

高性能视频实例分割无需视频注释