Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving heavily occluded objects in a video is still a very challenging task. To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario. OVIS consists of 296k high-quality instance masks and 901 occluded scenes. While our human vision systems can perceive those occluded objects by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, all baseline methods encounter a significant performance degradation of about 80% in the heavily occluded object group, which demonstrates that there is still a long way to go in understanding obscured objects and videos in a complex real-world scenario. To facilitate the research on new paradigms for video understanding systems, we launched a challenge based on the OVIS dataset. The submitted top-performing algorithms have achieved much higher performance than our baselines. In this paper, we will introduce the OVIS dataset and further dissect it by analyzing the results of baselines and submitted methods. The OVIS dataset and challenge information can be found at http://songbai.site/ovis .

该研究介绍了一种针对视频中不同程度遮挡物的识别方法，其中包括了一个大规模的数据集，这个数据集包括296k个高质量实例遮罩和901个遮挡场景。在这个数据集上，所有基线方法都遇到了重大的性能下降约80％的问题，从而证明了系统仍然有很长的路要走才能真正理解受遮挡的对象和视频。

视频目标实例遮挡分割：数据集与ICCV 2021挑战赛