Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each skill while ensuring the model maintains its overall performance requires sophisticated techniques and careful dataset curation. In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. This framework ensures the optimal comprehensive skill development of LLMs by dynamically adjusting the focus on different datasets based on their current learning state. To validate the effectiveness of MoS, we conduct extensive experiments using three diverse LLM backbones on two widely used benchmarks and demonstrate that MoS substantially enhances model performance. Building on the success of MoS, we propose MoSpec, an adaptation for task-specific fine-tuning, which harnesses the utilities of various datasets for a specific purpose. Our work underlines the significance of dataset rebalancing and present MoS as a powerful, general solution for optimizing data usage in the fine-tuning of LLMs for various purposes.

我们提出了一种通用的、模型无关的强化学习框架 Mixture-of-Skills (MoS)，它能在微调过程中自动优化数据使用，以实现大型语言模型的全面技能发展。我们通过在两个广泛使用的基准测试上进行大量实验证明 MoS 显著提高了模型性能，同时在任务特定微调方面，我们提出了一种适应性技术 MoSpec，为特定目的利用各种数据集的效用。我们的工作强调了数据集的再平衡的重要性，并将 MoS 提出为优化大型语言模型微调过程中数据使用的强大通用解决方案。

技能混合：学习为优化大型语言模型的数据使用进行微调