The surge in Large Language Models (LLMs) has revolutionized natural language
processing, but fine-tuning them for specific tasks often encounters challenges
in balancing performance and preserving general instruction-following
abilities. In this paper, we posit that the distribution gap between task
datasets and the LLMs serves as the primary underlying cause. To address the
problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach
that bridges the distribution gap by guiding fine-tuning with a distilled
dataset generated by the model itself to match its original distribution.
Experimental results on the Llama-2-chat model across various benchmarks
demonstrate that SDFT effectively mitigates catastrophic forgetting while
achieving comparable or superior performance on downstream tasks compared to
the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain
the helpfulness and safety alignment of LLMs. Our code is available at
https://github.com/sail-sg/sdft.

使用自我蒸馏微调（SDFT）方法，本研究通过引入由模型自身生成的蒸馏数据集来填补任务数据集与大型语言模型之间的分布差距，解决了在特定任务上微调时性能和通用指令跟随能力之间的挑战，并在多个基准测试中证明了 SDFT 方法在减轻灾难性遗忘的同时，在下游任务上实现了与传统微调相当或更优的性能，并且还展示了 SDFT 方法在保持 LLMs 的实用性和安全性之间的潜力。