Semantic segmentation has recently witnessed major progress, where fully convolutional neural networks have shown to perform well. However, most of the previous work focused on improving single image segmentation. To our knowledge, no prior work has made use of temporal video information in a recurrent network. In this paper, we propose and implement a novel method for online semantic segmentation of video sequences that utilizes temporal data. The network combines a fully convolutional network and a gated recurrent unit that works on a sliding window over consecutive frames. The convolutional gated recurrent unit is used to preserve spatial information and reduce the parameters learned. Our method has the advantage that it can work in an online fashion instead of operating over the whole input batch of video frames. This architecture is tested for both binary and semantic video segmentation tasks. Experiments are conducted on the recent benchmarks in SegTrack V2, Davis, CityScapes, and Synthia. It is shown to have 5% improvement in Segtrack and 3% improvement in Davis in F-measure over a baseline plain fully convolutional network. It also proved to have 5.7% improvement on Synthia in mean IoU, and 3.5% improvement on CityScapes in mean category IoU over the baseline network. The performance of the RFCN network depends on its baseline fully convolutional network. Thus RFCN architecture can be seen as a method to improve its baseline segmentation network by exploiting spatiotemporal information in videos.

本文提出了一种基于全卷积神经网络和门控循环体系结构的循环全卷积网络，利用视频中的时间信息进行在线语义分割， 大幅提高了分割的准确性，可以在二进制和语义视频分割任务中应用。

卷积门循环神经网络用于视频分割