BriefGPT.xyz
Nov, 2019
MKD:一种预训练语言模型的多任务知识蒸馏方法
Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation for Pretrained Models
HTML
PDF
Linqing Liu, Huan Wang, Jimmy Lin, Richard Socher, Caiming Xiong
TL;DR
本文提出了一种基于多任务学习的知识蒸馏方法,用于训练轻量级的预训练模型,该方法适用于不同的教师模型体系结构,并且相较于传统上基于LSTM的方法,具有更好的语言表达能力和更快的推理速度。
Abstract
In this paper, we explore the
knowledge distillation
approach under the
multi-task learning
setting. We distill the BERT model refined by
multi-t
→