Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang...
TL;DR本文提出了一种采用类似 KNN 特征匹配的方式,而无需耗时昂贵的视频掩模注释即可实现视/视频实例分割的新方法——MaskFreeVIS,并通过在 YouTube-VIS 2019/2021、OVIS 和 BDD100K MOTS 基准测试中的表现验证了该方法的有效性。
Abstract
The recent advancement in video instance segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this