In this paper, we empirically study how to make the most of low-resolution
frames for efficient video recognition. Existing methods mainly focus on
developing compact networks or alleviating temporal redundancy of video inputs
to increase efficiency, whereas compressing frame resolutio