BriefGPT.xyz
Oct, 2019
DistilBERT:BERT的简化版——更小、更快、更便宜、更轻
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
HTML
PDF
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf
TL;DR
通过知识蒸馏的预训练阶段,可以将BERT模型的大小缩小40%,同时保持97%的语言理解能力并且速度提升60%,这种方法被称为DistilBERT,并可为边缘设备上的计算提供良好的性能
Abstract
As
transfer learning
from large-scale pre-trained models becomes more prevalent in Natural Language Processing (
nlp
), operating these large models in on-the-edge and/or under constrained computational training or
→