TL;DR本文提出了一种名为 MoNet 的运动幻觉网络,通过从外观特征想象光流特征,而无需依赖光流计算,大幅度提高了视频分类性能,同时能够帮助削减一半的计算和数据存储负担。
Abstract
Appearance and motion are two key components to depict and characterize the
video content. Currently, the two-stream models have achieved state-of-the-art
performances on video classification. However, extracting