BriefGPT.xyz
Oct, 2023
学生大型语言模型是否能和老师一样表现出色?
Can a student Large Language Model perform as well as it's teacher?
HTML
PDF
Sia Gholami, Marwan Omar
TL;DR
深度学习模型、知识蒸馏、软标签、温度缩放和模型性能在知识蒸馏中的关键决定因素及其潜力。
Abstract
The burgeoning complexity of contemporary
deep learning models
, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments.
knowledge distillation
→