We address the problem of instance-level semantic segmentation, which aims at jointly detecting, segmenting and classifying every individual object in an image. In this context, existing methods typically propose candidate objects, usually as bounding boxes, and directly predict a binary mask within each such proposal. As a consequence, they cannot recover from errors in the object candidate generation process, such as too small or shifted boxes. In this paper, we introduce a novel object segment representation based on the distance transform of the object masks. We then design an object mask network (OMN) with a new residual-deconvolution architecture that infers such a representation and decodes it into the final binary object mask. This allows us to predict masks that go beyond the scope of the bounding boxes and are thus robust to inaccurate object candidates. We integrate our OMN into a Multitask Network Cascade framework, and learn the resulting shape-aware instance segmentation (SAIS) network in an end-to-end manner. Our experiments on the PASCAL VOC 2012 and the CityScapes datasets demonstrate the benefits of our approach, which outperforms the state-of-the-art in both object proposal generation and instance segmentation.

该论文介绍了一种基于距离变换的对象段表示方法和基于残差解卷积结构的对象掩码网络，实现了跨越边界框的对象分割；并将其整合到一个多任务网络级联框架中，学习了最终二进制对象掩码。实验表明，这种方法在目标生成和实例分割方面优于现有的技术。

边界感知实例分割