BriefGPT.xyz
Nov, 2024
ScaleKD:强大的视觉变换器可以成为优秀的教师
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
HTML
PDF
Jiawei Fan, Chao Li, Xiaolong Liu, Anbang Yao
TL;DR
本研究解决了如何利用预训练的视觉变换器(ViT)模型作为教师,推动跨架构知识蒸馏(KD)研究的可扩展性问题。提出了一种简单有效的KD方法ScaleKD,通过三种耦合组件的组合,显著提高了学生模型在各类图像分类任务上的表现,具备更高的效率和更大的模型利得。
Abstract
In this paper, we question if well pre-trained vision transformer (ViT) models could be used as teachers that exhibit scalable properties to advance
Cross Architecture
Knowledge Distillation
(KD) research, in the
→