BriefGPT.xyz
May, 2023
消除调制语言模型中的容量差距
Lifting the Curse of Capacity Gap in Distilling Language Models
HTML
PDF
Chen Zhang, Yang Yang, Jiahao Liu, Jingang Wang, Yunsen Xian...
TL;DR
本文介绍了一种基于最小化专家组(MiniMoE)的模型压缩框架,以解决预训练语言模型中师生之间的容量差异,从而在保持准确率的情况下减少推理计算量与压缩模型的大小。
Abstract
pretrained language models
(LMs) have shown compelling performance on various downstream tasks, but unfortunately they require a tremendous amount of inference compute.
knowledge distillation
finds a path to comp
→