Large Language Models (LLMs) have displayed remarkable performances across various complex tasks by leveraging Chain-of-Thought (CoT) prompting. Recently, studies have proposed a Knowledge Distillation (KD) approach, reasoning distillation, which transfers such reasoning ability of LLMs through fine-tuning language models of multi-step rationales generated by LLM teachers. However, they have inadequately considered two challenges regarding insufficient distillation sets from the LLM teacher model, in terms of 1) data quality and 2) soft label provision. In this paper, we propose Mentor-KD, which effectively distills the multi-step reasoning capability of LLMs to smaller LMs while addressing the aforementioned challenges. Specifically, we exploit a mentor, intermediate-sized task-specific fine-tuned model, to augment additional CoT annotations and provide soft labels for the student model during reasoning distillation. We conduct extensive experiments and confirm Mentor-KD's effectiveness across various models and complex reasoning tasks.

本研究解决了现有多步推理知识蒸馏方法中数据质量和软标签提供不足的问题。提出的导师知识蒸馏方法利用中间规模的任务特定微调模型，增强了链式思维注释并为学生模型提供软标签，从而有效提升小型语言模型的推理能力。实验结果显示，Mentor-KD 在多种模型和复杂推理任务中均表现出色。

导师知识蒸馏：提升小型语言模型的多步推理能力