BriefGPT.xyz
Jun, 2021
一个老师足矣?多个教师的预训练语言模型蒸馏
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
HTML
PDF
Chuhan Wu, Fangzhao Wu, Yongfeng Huang
TL;DR
本文提出了一种多教师知识蒸馏框架MT-BERT,可以从多个教师PLMs中训练高质量的学生模型,并在三个基准数据集上验证了其压缩PLMs的有效性。
Abstract
pre-trained language models
(PLMs) achieve great success in NLP. However, their huge model sizes hinder their applications in many practical systems.
knowledge distillation
is a popular technique to compress PLMs
→