BriefGPT.xyz
Jun, 2023
大型语言模型的知识蒸馏
Knowledge Distillation of Large Language Models
HTML
PDF
Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
TL;DR
本文提出了一种名为MiniLLM的方法,该方法利用Kullback-Leibler散度,会防止学生模型过度估计教师分布的低概率区域,实现了从生成式语言模型中提取出更小的语言模型,该方法在指令遵循情况下进行了广泛的实验,证明了MiniLLM模型的性能表现更佳。
Abstract
knowledge distillation
(KD) is a promising technique for reducing the high computational demand of large
language models
(LLMs). However, previous KD methods are primarily applied to white-box classification mode
→