Supervised fine-tuning (SFT) is a crucial step for large language models
(LLMs), enabling them to align with human instructions and enhance their
capabilities in downstream tasks. When the models are required to align with a
broader range of downstream tasks, or there is a desire to notably improve the
performance on a specific task, a substantial increase in fine-tuning data
often emerges as the solution. However, we find that large-scale increases in
instruction data can disrupt the world knowledge previously stored in the LLMs,
i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to
address above challenge. The LoRAMoE is a plugin version of Mixture of Experts
(MoE). The plugin-form ensures the integrity of world knowledge by freezing the
backbone model during the training phase. And we propose the use of localized
balancing constraints to coordinate parts of experts for task utilization,
meanwhile enables other experts to to fully leverage the world knowledge stored
in the models. Experimental results demonstrate that LoRAMoE can reasonly
coordinate experts based on data type during inference, and even dramatically
increasing instruction data does not result in knowledge forgetting. Moreover,
LoRAMoE provides additional benefits for the performance of downstream tasks,
indicating the potential of our approach for multi-task learning.

LoRAMoE 是一种基于插件形式的专家混合模型，通过冻结骨干模型在训练阶段保证了模型中存储的世界知识的完整性；利用局部平衡约束来均衡任务利用，同时有效发挥其他专家对模型中存储的世界知识的作用。实验证明，LoRAMoE 能在推理过程中合理协调专家，即使加大指导数据规模也不会导致知识遗忘；此外，LoRAMoE 对于下游任务的性能提供了额外的优势，显示了我们方法在多任务学习方面的潜力。