BriefGPT.xyz
Apr, 2020
XtremeDistil:大规模多语言模型的多阶段蒸馏
TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER
HTML
PDF
Subhabrata Mukherjee, Ahmed Awadallah
TL;DR
本研究聚焦于多语言实体识别,探究知识蒸馏压缩预训练语言模型的多种策略,通过利用教师模型内部表示的分阶段优化方案,成功将MBERT模型压缩了35倍参数,51倍批量推理的延迟,同时保持在41种语言中的95%的F1分数。
Abstract
Deep and large
pre-trained language models
are the state-of-the-art for various natural language processing tasks. However, the huge size of these models could be a deterrent to use them in practice. Some recent and concurrent works use
→