Modern deep neural networks (DNNs) require significant memory to store
weight, activations, and other intermediate tensors during training. Hence,
many models do not fit one GPU device or can be trained using only a small
per-GPU batch size. This survey provides a systematic overview o