利用逐层学习率改进迁移学习中的知识蒸馏

Jul, 2024

利用逐层学习率改进迁移学习中的知识蒸馏

Improving Knowledge Distillation in Transfer Learning with Layer-wise Learning Rates

Shirley Kokane, Mostofa Rafid Uddin, Min Xu

TL;DR通过逐层学习方案调整学习参数，我们提出了一种新颖的方法，应用于基于注意力图和导数的迁移学习方法，并在广泛的数据集中观察到改进的学习性能和稳定性。

Abstract

transfer learning methods start performing poorly when the complexity of the learning task is increased. Most of these methods calculate the cumulative differences of all the matched features and then use them to back-propagate that loss through all the layers. Contrary to these method