BriefGPT.xyz
Feb, 2024
通过更稀疏的选择提高稀疏模型的效率
Enhancing Efficiency in Sparse Models with Sparser Selection
HTML
PDF
Yuanhang Yang, Shiyi Qi, Wenchao Gu, Chaozheng Wang, Cuiyun Gao...
TL;DR
提出了 ool,一种新颖的MoE模型,通过利用小型专家和基于阈值的路由器,实现了对模型性能的提升,并在减少计算负载50%以上的同时,不牺牲性能。
Abstract
sparse models
, including sparse
mixture-of-experts
(MoE) models, have emerged as an effective approach for
scaling transformer models
. How
→