BriefGPT.xyz
Jun, 2023
针对自回归序列模型的广义知识蒸馏
GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
HTML
PDF
Rishabh Agarwal, Nino Vieillard, Piotr Stanczyk, Sabela Ramos, Matthieu Geist...
TL;DR
本论文提出了一种推广的知识蒸馏方法,旨在解决在训练和生成阶段输出序列之间的差异,并通过优化替代的发散方法来处理模型不充分的问题。实验证明,Generalized Knowledge Distillation (GKD) 在压缩生成语言模型时表现优异。
Abstract
knowledge distillation
is commonly used for compressing neural networks to reduce their inference cost and memory footprint. However, current distillation methods for
auto-regressive models
, such as
→