学生大型语言模型是否能和老师一样表现出色？

Oct, 2023

学生大型语言模型是否能和老师一样表现出色？

Can a student Large Language Model perform as well as it's teacher?

Sia Gholami, Marwan Omar

TL;DR深度学习模型、知识蒸馏、软标签、温度缩放和模型性能在知识蒸馏中的关键决定因素及其潜力。

Abstract

The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. knowledge distillation