Larger and deeper neural network architectures deliver improved accuracy on a variety of tasks, but also require a large amount of memory for training to store intermediate activations for back-propagation. We introduce an approximation strategy to significantly reduce this memory footprint, with minimal effect on training performance and negligible computational cost. Our method replaces intermediate activations with lower-precision approximations to free up memory, after the full-precision versions have been used for computation in subsequent layers in the forward pass. Only these approximate activations are retained for use in the backward pass. Compared to naive low-precision computation, our approach limits the accumulation of errors across layers and allows the use of much lower-precision approximations without affecting training accuracy. Experiments on CIFAR and ImageNet show that our method yields performance comparable to full-precision training, while storing activations at a fraction of the memory cost with 8- and even 4-bit fixed-point precision.

本文提出了一种新的反向传播实现，通过使用近似来显著减少内存使用，使用相对较低的精度近似，而不影响训练准确性，并展示了其在 CIFAR-10、CIFAR-100 和 ImageNet 数据集上的优异表现。

使用近似激活的反向传播进行内存高效网络训练