BriefGPT.xyz
Mar, 2025
对比方法提升大语言模型的蒸馏效果:DistiLLM-2
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
HTML
PDF
Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang...
TL;DR
本研究解决了在大语言模型蒸馏中不同数据类型与损失函数协同作用的不足,提出了一种对比方法DistiLLM-2。该方法通过增加教师模型响应的可能性并降低学生模型响应的可能性,显著提升了学生模型的表现,使其在多种任务中表现出色,并且能够支持如偏好对齐和视觉语言扩展等不同应用。
Abstract
Despite the success of
Distillation
in
Large Language Models
(LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between los
→