BriefGPT.xyz
Oct, 2020
文本分类的对抗自监督无数据蒸馏
Adversarial Self-Supervised Data-Free Distillation for Text Classification
HTML
PDF
Xinyin Ma, Yongliang Shen, Gongfan Fang, Chen Chen, Chenghao Jia...
TL;DR
提出一种名为 AS-DFD 的新的两阶段无数据蒸馏方法,用于压缩大型基于 Transformer 的模型(例如 BERT),并且是第一个面向 NLP 任务设计的无数据蒸馏框架,在 Text Classification 数据集上验证了其有效性。
Abstract
Large pre-trained transformer-based language models have achieved impressive results on a wide range of
nlp tasks
. In the past few years,
knowledge distillation
(KD) has become a popular paradigm to compress a com
→