TL;DR本文提出了一种从三个稀疏角度出发的训练框架 Tri-Level E-ViT,探索了数据冗余的减少,并证明了该框架不仅可以加速各种 ViT 架构的训练,还可以提高准确性。
Abstract
vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression