分散实施的混合专家

Mar, 2024

Scattered Mixture-of-Experts Implementation

Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

TL;DR通过引入 ScatterMoE 和 ParallelLinear，实现了在 GPU 上的 Sparse Mixture-of-Experts，并通过与 Megablocks 的对比验证了其高吞吐量和较低的内存占用，同时展示了 ParallelLinear 对 Mixture of Attention 概念的扩展性。

Abstract

We present scattermoe, an implementation of sparse mixture-of-experts (SMoE) on gpus. →