BriefGPT.xyz
Feb, 2023
中间层蒸馏在压缩语言模型中的再次探讨:过拟合的角度
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
HTML
PDF
Jongwoo Ko, Seungjoon Park, Minchan Jeong, Sukjin Hong, Euijai Ahn...
TL;DR
本文介绍了一种名为一致性正则化的中间层知识蒸馏方法,有效解决了其他中间层知识蒸馏方法容易过拟合的问题,并在模型蒸馏方面表现高效。
Abstract
knowledge distillation
(KD) is a highly promising method for mitigating the computational problems of
pre-trained language models
(PLMs). Among various KD approaches,
→