BriefGPT.xyz
Mar, 2024
分散实施的混合专家
Scattered Mixture-of-Experts Implementation
HTML
PDF
Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville
TL;DR
通过引入 ScatterMoE 和 ParallelLinear,实现了在 GPU 上的 Sparse Mixture-of-Experts,并通过与 Megablocks 的对比验证了其高吞吐量和较低的内存占用,同时展示了 ParallelLinear 对 Mixture of Attention 概念的扩展性。
Abstract
We present
scattermoe
, an implementation of
sparse mixture-of-experts
(SMoE) on
gpus
.
→