BriefGPT.xyz
Feb, 2024
预训练模型知识蒸馏的实用洞见
Practical Insights into Knowledge Distillation for Pre-Trained Models
HTML
PDF
Norah Alballa, Marco Canini
TL;DR
通过对知识蒸馏(KD)技术的综合比较研究,本文填补了目前研究中的空白,揭示了在协作与联邦学习框架中利用预训练模型中的知识蒸馏技术的最佳超参数设置,通过降低通信回合和加速训练过程,提高模型性能的实用框架。
Abstract
This research investigates the enhancement of
knowledge distillation
(KD) processes in
pre-trained models
, an emerging field in knowledge transfer with significant implications for distributed training and
→