As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero values in parameters that can then be discarded from storage or computations. While most research focuses on high levels of sparsity, there are challenges in universally maintaining model accuracy as well as achieving significant speedups over modern matrix-math hardware. To make sparsity adoption practical, the NVIDIA Ampere GPU architecture introduces sparsity support in its matrix-math units, Tensor Cores. We present the design and behavior of Sparse Tensor Cores, which exploit a 2:4 (50%) sparsity pattern that leads to twice the math throughput of dense matrix units. We also describe a simple workflow for training networks that both satisfy 2:4 sparsity pattern requirements and maintain accuracy, verifying it on a wide range of common tasks and model architectures. This workflow makes it easy to prepare accurate models for efficient deployment on Sparse Tensor Cores.

介绍了 NVIDIA Ampere GPU 架构中的稀疏张量核心 (Sparse Tensor Cores)，它们利用了 2:4 的稀疏模式，通过两倍的数学吞吐量加速了稠密矩阵单元，并提出了一种简单的工作流程以训练满足 2:4 稀疏模式和保持准确性的网络，从而在稀疏张量核心上实现精确模型的高效部署。

加速稀疏深度神经网络