TL;DR本文提出了一种利用4个Pseudo-3D残差块构建的Pseudo-3D Residual Net (P3D ResNet)架构,并将其应用于视频分类问题中,克服了3D CNN的计算成本及内存需求高的问题,通过将空间卷积和时间卷积组合的方式显著提高了视频图像识别和分类的准确性。
Abstract
convolutional neural networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal →