Haoqian Wu, Keyu Chen, Yanan Luo, Ruizhi Qiao, Bo Ren...
TL;DR提出了一种有效的自监督学习(SSL)框架,通过探索大量的数据增强和洗牌方法来提高模型的泛化能力,并引入一个简单的时间模型来验证镜头特征的质量,从而实现场景一致性。该方法在Video Scene Segmentation任务上取得了最先进的性能,并提出了更公平合理的评估方法。
Abstract
A long-term video, such as a movie or TV show, is composed of various scenes, each of which represents a series of shots sharing the same semantic story. Spotting the correct scene boundary from the long-term video is a challenging task, since a model must understand the storyline of the video to figure out where a scene starts and ends. To this end, we prop