Knowledge distillation (KD) enhances the performance of a student network by allowing it to learn the knowledge transferred from a teacher network incrementally. Existing methods dynamically adjust the temperature to enable the student network to adapt to the varying learning difficulties at different learning stages of KD. KD is a continuous process, but when adjusting the temperature, these methods consider only the immediate benefits of the operation in the current learning phase and fail to take into account its future returns. To address this issue, we formulate the adjustment of temperature as a sequential decision-making task and propose a method based on reinforcement learning, termed RLKD. Importantly, we design a novel state representation to enable the agent to make more informed action (i.e. instance temperature adjustment). To handle the problem of delayed rewards in our method due to the KD setting, we explore an instance reward calibration approach. In addition,we devise an efficient exploration strategy that enables the agent to learn valuable instance temperature adjustment policy more efficiently. Our framework can serve as a plug-and-play technique to be inserted into various KD methods easily, and we validate its effectiveness on both image classification and object detection tasks. Our code is at https://github.com/Zhengbo-Zhang/ITKD

知识蒸馏(KD)通过允许学生网络逐步学习从教师网络传输的知识来提高其性能。我们提出了一种基于强化学习的方法RLKD，将温度调整视为顺序决策任务，并设计了一种新颖的状态表示来使代理能够做出更明智的动作(即实例温度调整)。我们的方法解决了由于KD设置导致的延迟奖励问题，并采用了高效的探索策略。我们的框架可以轻松插入到各种KD方法中，并在图像分类和目标检测任务上验证了其有效性。

实例温度知识蒸馏