$N{:}M$ sparsity is an emerging Model Compression method supported by more and more Accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing $N{:}M$ sparsity methods compress n
Vision transformers are state-of-the-art models that use attention to identify key features in images, but their performance regarding sparse double descent and the optimal model size remains unknown.