BriefGPT.xyz
May, 2024
2BP:2阶段反向传播
2BP: 2-Stage Backpropagation
HTML
PDF
Christopher Rae, Joseph K. L. Lee, James Richings
TL;DR
通过将反向传播步骤分为两个独立阶段,本文引入2阶段反向传播(2BP),以减少空闲计算时间,并在各种模型架构和管道调度上测试2BP,从而在所有情况下实现吞吐量的增加。使用2BP,相较于传统方法,在训练一个类似LLaMa的Transformer时,能够实现吞吐量的1.70倍增长,模型参数为70亿个,跨4个GPU。
Abstract
As
deep neural networks
(DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators.
pipeline parallelis
→