BriefGPT.xyz
Oct, 2020
预训练模型摘要精馏
Pre-trained Summarization Distillation
HTML
PDF
Sam Shleifer, Alexander M. Rush
TL;DR
使用三种不同的压缩学生模型的方法(直接知识蒸馏、伪标签蒸馏和收缩和微调),对当前和过去的最先进的 PEGASUS 和 BART 模型进行压缩蒸馏,在 CNN / DailyMail 数据集上 SFT 方法表现最佳,在更抽象的 XSUM 数据集上伪标签蒸馏方法表现更好。
Abstract
Current state-of-the-art approaches to
summarization
utilize large pre-trained
transformer models
. Distilling these models to smaller student models has become critically important for practical use; however ther
→