BriefGPT.xyz
Feb, 2018
Horovod: 在 TensorFlow 中实现快速且易用的分布式深度学习
Horovod: fast and easy distributed deep learning in TensorFlow
HTML
PDF
Alexander Sergeev, Mike Del Balso
TL;DR
本文介绍了Horovod,它是一个开源库,可通过ring reductions实现高效的跨GPU通信,只需要对用户代码进行少量修改即可在TensorFlow中实现更快、更容易的分布式训练。
Abstract
Training modern
deep learning
models requires large amounts of computation, often provided by GPUs. Scaling computation from one
gpu
to many can enable much faster training and research progress but entails two c
→