Jun, 2022
可扩展自适应专家混合模型
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu...
TL;DRTutel is a highly scalable stack design for Mixture-of-Experts (MoE) with dynamically adaptive parallelism and pipelining that achieves up to a 5.75x speedup of a single MoE layer on 2,048 GPUs over Fairseq, and delivers efficiency and effectiveness in running a real-world MoE-based model named SwinV2-MoE.