BriefGPT.xyz
Feb, 2023
通过知识选择改进预训练语言模型的知识蒸馏
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
HTML
PDF
Chenglong Wang, Yi Lu, Yongyu Mu, Yimin Hu, Tong Xiao...
TL;DR
本文提出了一种基于演员-评论家方法的知识蒸馏框架,旨在从教师模型中选择适当的知识来训练学生模型,实验结果表明该方法在GLUE数据集上优于常规基线模型。
Abstract
knowledge distillation
addresses the problem of transferring knowledge from a
teacher model
to a
student model
. In this process, we typica
→