Jun, 2022

可扩展自适应专家混合模型

TL;DRTutel is a highly scalable stack design for Mixture-of-Experts (MoE) with dynamically adaptive parallelism and pipelining that achieves up to a 5.75x speedup of a single MoE layer on 2,048 GPUs over Fairseq, and delivers efficiency and effectiveness in running a real-world MoE-based model named SwinV2-MoE.