BriefGPT.xyz
Jun, 2024
PETRA: 并行端到端训练与可逆架构
PETRA: Parallel End-to-end Training with Reversible Architectures
HTML
PDF
Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg...
TL;DR
我们介绍了一种用于并行计算深度模型训练的PETRA替代方法,它通过反向传播和单一参数版本的保持来解决权重存储的问题,并在CIFAR-10、ImageNet32和ImageNet上展示了其与后向传播相比具有竞争性的准确性。
Abstract
reversible architectures
have been shown to be capable of performing on par with their non-
reversible architectures
, being applied in deep learning for memory savings and generative modeling. In this work, we sho
→