大规模神经网络训练调查

Feb, 2022

Survey on Large Scale Neural Network Training

Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky...

TL;DR本文系统地讨论了提高深度神经网络训练效率的方法，重点考虑了内存利用率和GPU训练，分类总结了相关策略，并且比较了不同类别之间的方法。

Abstract

Modern deep neural networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview o