BriefGPT.xyz
Feb, 2024
DistiLLM:面向大型语言模型的精简蒸馏
DistiLLM: Towards Streamlined Distillation for Large Language Models
HTML
PDF
Jongwoo Ko, Sungnyun Kim, Tianyi Chen, Se-Young Yun
TL;DR
DistiLLM是一种更有效和高效的知识蒸馏框架,适用于自回归语言模型,通过引入倾斜的Kullback-Leibler散度损失和自适应的离策略方法,构建高性能的学生模型,并相较于最近的知识蒸馏方法获得最高4.3倍的加速比。
Abstract
knowledge distillation
(KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for
→